How to Detect a Compressed File in C#

How to Detect a Compressed File in C# - c#

I'm trying to write a text search routine that scans a directory for a given wildcard file spec and scans the matches for a given search string. Everything works except for when I get to ZIP files. Here's the relevant code:
string fileText = File.ReadAllText(filePath);
foreach (string s in lstSearchStrings.Items)
{
int cnt = CountSubStrings(fileText, s);
lstCounts.Items.Add(cnt.ToString());
}
I know it only uses text-based routines so I probably need to change that. Any help in where to make changes / what to do would be appreciated!

You can use SharpZipLib to read inside zip files.
using ICSharpCode.SharpZipLib.Zip;
using (var zipFile = new ZipFile(#"test.zip"))
{
foreach (ZipEntry entry in zipFile)
{
Console.WriteLine(entry.Name);
}
}
First you detect zip files using extension. Then read those using above method.
string fileText = File.ReadAllText(entry.name);

Related

Invalid filename retrieved from zip file C#

Hi I'm trying to read a zip file which contains multiple files. Few of the files inside is named in Arabic.
My file name: (Final تدريب.pdf) becomes (Final óº⌐∩á.pdf) instead. I hope you can guide me.
Here is my code:
using Ionic.Zip;
using (ZipFile zip = ZipFile.Read(path))
{
foreach (var entry in zip.Entries)
{
string name = entry.FileName;
}
}

Packing files and folders into a single file

I'm tring to create a program to archieve some files from a folder to a single binary file so I can read files from the binary archieve later.
So I created a archivation method but I don't really know how can I read the files from the binary without unpacking them...
Some code:
public static void PackFiles()
{
using (var doFile = File.Create("root.extension"))
using (var doBinary = new BinaryWriter(doFile))
{
foreach (var file in Directory.GetFiles("Data"))
{
doBinary.Write(true);
doBinary.Write(Path.GetFileName(file));
var data = File.ReadAllBytes(file);
doBinary.Write(data.Length);
doBinary.Write(data);
}
doBinary.Write(false);
}
}
Also, can I set a kind of "password" to the binary file so the archieve can only be unpacked if the password is known?
P.S: I dont need zip :)

I think the best way to go is using ZIP for this.
There is a solid and fast library called dotnetzip
Example using password:
using (ZipFile zip = new ZipFile())
{
zip.Password= "123456!";
zip.AddFile("ReadMe.txt");
zip.AddFile("7440-N49th.png");
zip.AddFile("2005_Annual_Report.pdf");
zip.Save("Backup.zip");
}

Reading contents from several files and writing to one file

In my application there is a situation like this.Before creating a file, my application search for files in a directory under a particular filename. If any file/files found, then it should read each files contents and write these contents(of each file) to a new file. I have googled many and tried some like this:
string temp_file_format = "ScriptLog_" + DateTime.Now.ToString("dd_MM_yyyy_HH");
string[] files = Directory.GetFiles(path,temp_file_format);
foreach (FileAccess finfo in files)
{
string text = File.ReadAllText(finfo);
}
and
System.IO.DirectoryInfo dir = new DirectoryInfo(path);
System.IO.FileInfo[] files = dir.GetFiles(temp_file_format);
foreach (FileInfo finfo in files)
{
finfo.OpenRead();
}
But all these failed..Can anyone show me an alternative for this?
Is there anything wrong in my temp_file_format string?
It will be nice if I could prepend these contents to the new file. Else also, no worries..
any help would be really appreciated..

This is a compete working implementation that does all of that
without reading everything in memory at one time (which doesn't work for large files)
without keeping any files open for more than the required time
using System.IO;
using System.Linq;
public static class Program {
public static void Main()
{
var all = Directory.GetFiles("/tmp", "*.cpp")
.SelectMany(File.ReadAllLines);
using (var w = new StreamWriter("/tmp/output.txt"))
foreach(var line in all)
w.WriteLine(line);
}
}
I tested it on mono 2.10, and it should work on any .NET 4.0+ (for File.ReadAllLines which is a lazy linewise enumerable)

Here's a short snippet that reads all the files and out puts them to the path outputPath
var lines = from file in Directory.GetFiles(path,temp_file_format)
from line in File.ReadAllLines(file)
select line;
File.WriteAllLines(outputPath, content);
The problem you are having with your code is not really related to reading files but simply trying to use an object as a type it's not. Directory.GetFiles returns an array of string and File.ReadXXX and File.OpenRead expects the path as a string. So you simply need to pass each of the strings returned as the path argument to the appropriate method. The above is one such example. Hope it helps both solve your problem and explain the actually issue with your code

try this:
foreach (FileInfo finfo in files)
{
try
{
using (StreamReader sr = new StreamReader("finfo "))
{
String line = sr.ReadToEnd();
Console.WriteLine(line);
}
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}

using (var output = File.Create(outputPath))
{
foreach (var file in Directory.GetFiles(InputPath,temp_file_format))
{
using (var input = File.OpenRead(file))
{
input.CopyTo(output);
}
}
}

DotNetZip: How to extract files, but ignoring the path in the zipfile?

Trying to extract files to a given folder ignoring the path in the zipfile but there doesn't seem to be a way.
This seems a fairly basic requirement given all the other good stuff implemented in there.
What am i missing ?
code is -
using (Ionic.Zip.ZipFile zf = Ionic.Zip.ZipFile.Read(zipPath))
{
zf.ExtractAll(appPath);
}

While you can't specify it for a specific call to Extract() or ExtractAll(), the ZipFile class has a FlattenFoldersOnExtract field. When set to true, it flattens all the extracted files into one folder:
var flattenFoldersOnExtract = zip.FlattenFoldersOnExtract;
zip.FlattenFoldersOnExtract = true;
zip.ExtractAll();
zip.FlattenFoldersOnExtract = flattenFoldersOnExtract;

You'll need to remove the directory part of the filename just prior to unzipping...
using (var zf = Ionic.Zip.ZipFile.Read(zipPath))
{
zf.ToList().ForEach(entry =>
{
entry.FileName = System.IO.Path.GetFileName(entry.FileName);
entry.Extract(appPath);
});
}

You can use the overload that takes a stream as a parameter. In this way you have full control of path where the files will be extracted to.
Example:
using (ZipFile zip = new ZipFile(ZipPath))
{
foreach (ZipEntry e in zip)
{
string newPath = Path.Combine(FolderToExtractTo, e.FileName);
if (e.IsDirectory)
{
Directory.CreateDirectory(newPath);
}
else
{
using (FileStream stream = new FileStream(newPath, FileMode.Create))
e.Extract(stream);
}
}
}

That will fail if there are 2 files with equal filenames. For example
files\additionalfiles\file1.txt
temp\file1.txt
First file will be renamed to file1.txt in the zip file and when the second file is trying to be renamed an exception is thrown saying that an item with the same key already exists

Extracting files from a Zip archive programmatically using C# and System.IO.Packaging

I have a bunch of ZIP files that are in desperate need of some hierarchical reorganization and extraction. What I can do, currently, is create the directory structure and move the zip files to the proper location. The mystic cheese that I am missing is the part that extracts the files from the ZIP archive.
I have seen the MSDN articles on the ZipArchive class and understand them reasonable well. I have also seen the VBScript ways to extract. This is not a complex class so extracting stuff should be pretty simple. In fact, it works "mostly". I have included my current code below for reference.
using (ZipPackage package = (ZipPackage)Package.Open(#"..\..\test.zip", FileMode.Open, FileAccess.Read))
{
PackagePartCollection packageParts = package.GetParts();
foreach (PackageRelationship relation in packageParts)
{
//Do Stuff but never gets here since packageParts is empty.
}
}
The problem seems to be somewhere in the GetParts (or GetAnything for that matter). It seems that the package, while open, is empty. Digging deeper the debugger shows that the private member _zipArchive shows that it actually has parts. Parts with the right names and everything. Why won't the GetParts function retrieve them? I'ver tried casting the open to a ZipArchive and that didn't help. Grrr.

If you are manipulating ZIP files, you may want to look into a 3rd-party library to help you.
For example, DotNetZip, which has been recently updated. The current version is now v1.8. Here's an example to create a zip:
using (ZipFile zip = new ZipFile())
{
zip.AddFile("c:\\photos\\personal\\7440-N49th.png");
zip.AddFile("c:\\Desktop\\2005_Annual_Report.pdf");
zip.AddFile("ReadMe.txt");
zip.Save("Archive.zip");
}
Here's an example to update an existing zip; you don't need to extract the files to do it:
using (ZipFile zip = ZipFile.Read("ExistingArchive.zip"))
{
// 1. remove an entry, given the name
zip.RemoveEntry("README.txt");
// 2. Update an existing entry, with content from the filesystem
zip.UpdateItem("Portfolio.doc");
// 3. modify the filename of an existing entry
// (rename it and move it to a sub directory)
ZipEntry e = zip["Table1.jpg"];
e.FileName = "images/Figure1.jpg";
// 4. insert or modify the comment on the zip archive
zip.Comment = "This zip archive was updated " + System.DateTime.ToString("G");
// 5. finally, save the modified archive
zip.Save();
}
here's an example that extracts entries:
using (ZipFile zip = ZipFile.Read("ExistingZipFile.zip"))
{
foreach (ZipEntry e in zip)
{
e.Extract(TargetDirectory, true); // true => overwrite existing files
}
}
DotNetZip supports multi-byte chars in filenames, Zip encryption, AES encryption, streams, Unicode, self-extracting archives.
Also does ZIP64, for file lengths greater than 0xFFFFFFFF, or for archives with more than 65535 entries.
free. open source
get it at
codeplex or direct download from windows.net - CodePlex has been discontinued and archived

From MSDN,
In this sample, the Package class is used (as opposed to the ZipPackage.) Having worked with both, I've only seen flakiness happen when there's corruption in the zip file. Not necessarily corruption that throws the Windows extractor or Winzip, but something that the Packaging components have trouble handling.
Hope this helps, maybe it can provide you an alternative to debugging the issue.
using System;
using System.IO;
using System.IO.Packaging;
using System.Text;
class ExtractPackagedImages
{
static void Main(string[] paths)
{
foreach (string path in paths)
{
using (Package package = Package.Open(
path, FileMode.Open, FileAccess.Read))
{
DirectoryInfo dir = Directory.CreateDirectory(path + " Images");
foreach (PackagePart part in package.GetParts())
{
if (part.ContentType.ToLowerInvariant().StartsWith("image/"))
{
string target = Path.Combine(
dir.FullName, CreateFilenameFromUri(part.Uri));
using (Stream source = part.GetStream(
FileMode.Open, FileAccess.Read))
using (Stream destination = File.OpenWrite(target))
{
byte[] buffer = new byte[0x1000];
int read;
while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, read);
}
}
Console.WriteLine("Extracted {0}", target);
}
}
}
}
Console.WriteLine("Done");
}
private static string CreateFilenameFromUri(Uri uri)
{
char [] invalidChars = Path.GetInvalidFileNameChars();
StringBuilder sb = new StringBuilder(uri.OriginalString.Length);
foreach (char c in uri.OriginalString)
{
sb.Append(Array.IndexOf(invalidChars, c) < 0 ? c : '_');
}
return sb.ToString();
}
}

From "ZipPackage Class" (MSDN):
While Packages are stored as Zip files* through the ZipPackage class, all Zip files are not ZipPackages. A ZipPackage has special requirements such as URI-compliant file (part) names and a "[Content_Types].xml" file that defines the MIME types for all the files contained in the Package. The ZipPackage class cannot be used to open arbitary Zip files that do not conform to the Open Packaging Conventions standard.
For further details see Section 9.2 "Mapping to a ZIP Archive" of the ECMA International "Open Packaging Conventions" standard, http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%20Part%202%20(DOCX).zip (342Kb) or http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%20Part%202%20(PDF).zip (1.3Mb)
*You can simply add ".zip" to the extension of any ZipPackage-based file (.docx, .xlsx, .pptx, etc.) to open it in your favorite Zip utility.

I was having the exact same problem! To get the GetParts() method to return something, I had to add the [Content_Types].xml file to the root of the archive with a "Default" node for every file extension included. Once I added this (just using Windows Explorer), my code was able to read and extract the archived contents.
More information on the [Content_Types].xml file can be found here:
http://msdn.microsoft.com/en-us/magazine/cc163372.aspx - There is an example file below Figure 13 of the article.
var zipFilePath = "c:\\myfile.zip";
var tempFolderPath = "c:\\unzipped";
using (Package package = ZipPackage.Open(zipFilePath, FileMode.Open, FileAccess.Read))
{
foreach (PackagePart part in package.GetParts())
{
var target = Path.GetFullPath(Path.Combine(tempFolderPath, part.Uri.OriginalString.TrimStart('/')));
var targetDir = target.Remove(target.LastIndexOf('\\'));
if (!Directory.Exists(targetDir))
Directory.CreateDirectory(targetDir);
using (Stream source = part.GetStream(FileMode.Open, FileAccess.Read))
{
FileStream targetFile = File.OpenWrite(target);
source.CopyTo(targetFile);
targetFile.Close();
}
}
}
Note: this code uses the Stream.CopyTo method in .NET 4.0

I agree withe Cheeso. System.IO.Packaging is awkward when handling generic zip files, seeing as it was designed for Office Open XML documents. I'd suggest using DotNetZip or SharpZipLib

(This is basically a rephrasing of this answer)
Turns out that System.IO.Packaging.ZipPackage doesn't support PKZIP, that's why when you open a "generic" ZIP file no "parts" are returned. This class only supports some specific flavor of ZIP files (see comments at the bottom of MSDN description) used among other as Windows Azure service packages up to SDK 1.6 - that's why if you unpack a service package and then repack it using say Info-ZIP packer it will become invalid.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to Detect a Compressed File in C# - c#

Related

Invalid filename retrieved from zip file C#

Packing files and folders into a single file

Reading contents from several files and writing to one file

DotNetZip: How to extract files, but ignoring the path in the zipfile?

Extracting files from a Zip archive programmatically using C# and System.IO.Packaging

Categories

Resources