How to remove zip compression from an XML file? - c#

I have an XML file format .zfo that is compressed using zip algorithm. I need to remove this compression from the file, so that it is in usable XML form. Here is the file.
How can I remove this compression, or decompress this XML file?
It's not like you might imagine i.e: .zip file containing an xml file. Instead the byte[] that's written to the file is zip compressed.
Thanks in advance.

That file isn't zip compressed at all. It appears to be some xml that's embedded in a certificate, issued by the Czech Post Office. The actual message looks to be encoded in some kind of base64 variant.
Call your post office.

Check out DotNetZip (http://www.codeplex.com/DotNetZip)--it probably does what you need (e.g., DeflateStream).
A zip file contains meta-data (file and directory structure) as well as the actually compressed data. It sounds like your file only has the compressed data. DotNetZip should be able to handle both.

Related

Using StreamReader and StreamWriter for TGZ file copied from Solaris

We have a very old file delivery application(IPGear, if you have heard about it, written in tcl). We upload our IP files there and our customers download it from the system.
When you upload a file to this application, it adds .RCA extension to uploaded file and add some metadata to file. if we view the content of any file in a text editor(Usually tgz, pdf and text files), we see some metadata added to the top of the file by the application(5-10 lines, readable).
If you download a file from the system, they somehow strip this metadata from the file and returns as TGZ file which works fine(we can extract it)
if we find that RCA file on the storage where this application keeps files and edit the metadata they have added via text editor, we are able to extract the file without any problem., which fine too. But we need to do this process for 22K files, therefore we need to script it.
We are able to find the bits the application adds by opening via StreamReader, and strip the metadata and write file to the disk via StreamWriter. However, the file we write to the system is corrupted if it is TGZ file. if we do same thing for text files, they work.
the content of the tgz file looks below when we open in text editor
The bits on lines 29-38 are the metadata we strip.
it looks like the streamreader is not able to write this content back to disk even if we tried different encoding settings.
One another note about this is that the file we are trying to read and write is copied from a Solaris based server into local machine(Windows 7) via WinSCP.
So, my question is, what is the best way of reading TGZ file into memory(as text) so manipulation, and save back without corruption? is streamreader and streamwriter not good for this purpose?
I tried to give as much information as I can, please add comments if you need more clarification.
it looks like the streamreader is not able to write this content back to disk even if we tried different encoding settings.
Yes, because a tgz file isn't plain text. StreamReader and StreamWriter are for text content, not arbitrary binary content.
So, my question is, what is the best way of reading TGZ file into memory(as text)
You don't. You read it as binary data, because it is binary data.
If the TGZ archive contains text files, you'll need to decompress the TGZ to the TAR format, then extract the relevant data from that. Then you can work with it as text. Before that point, it's just binary data.
But it sounds like you actually may just want to read text information before the TGZ file... in which case you need to work out where that text information ends, and not read any of the TGZ file as text (because it's not). This is non-trivial, but if you know that the text is in ASCII it'll be a bit easier - you will need to work out how to detect the end of the text and the start of the real content though, and we can't really tell that from the screenshot you've given.

Efficiently finding the segment that has undergone changes recently in a Docx File

I am developing an application which takes the Back Up of Docx file. For the Initial Back Up I copy the entire file in the destination, but next time I want to perform an incremental Back Up i.e I want to backup only that segment of the Docx file that has undergone changes. I need to find the most efficient to do the same.
I would really be thankful if I get any help in this regard.
The DOCX file is different from the previous Microsoft Word programs, which use the file extension DOC, in the sense that whereas a DOC file uses a text or binary format for storing a document, a DOCX file is based on XML and uses ZIP compression for a smaller file size. In other words, a DOCX file is a set of XML files that have been compressed using ZIP.
It might help if you can use ZipFile to dissect and tell which file is really changed and then incrementally save only the changes in your VCS.

C# Save byte array as xml file

I'm receiving a .zip file from a server.
The .zip file is sent 64Base encoded and it contains an XML file.
After I decode the data to binary using Convert.FromBase64String, can I convert the byte array to XML?
I don't want to deal with unzipping.
I tried the following code: (that resulted in Gibberish that doesn't make any sense and doesn't look like XML at all)
XmlDocument doc = new XmlDocument();
string xml = Encoding.UTF8.GetString(buffer);
doc.LoadXml(xml);
Any ideas?
You say you don't want to unzip, but do you actually mean that you don't want to unzip to disc? Most zip libraries either allow you to unzip a file to a byte array directly or to a stream where you could pass it a MemoryStream.
There's no getting around having to uncompress. Unless you have control over the server side, then you could change the format to an uncompressed file (like a tar file). Then you wouldn't have to uncompress.
You say:
I'm receiving a .zip file from a server.
And:
I don't want to deal with unzipping.
Well. You have to. If the data is in a zip archive, you need to extract it first. You can't just ignore the fact.
There are plenty of zip libraries - sharpziplib is free and easy enough to use.

How to Decompress nested GZip (TGZ) files in C#

I am receiving a TGZ file that will contain one plain text file along with possibly one or more nested TGZ files. I have figured out how to decompress the main TGZ file and read the plain text file contained in it, but I have not been able to figure out how to recognize and decompress the nested TGZ files. Has anyone come across this problem before?
Also, I do not have control over the file I am receiving, so I cannot change the format of a TGZ file containing nested TGZ files. One other caveat (even though I don't think it matters) is that these files are being compressed and tarred in a Unix or Linux environment.
Thanks in advance for any help.
Try the SharpZipLib (http://www.icsharpcode.net/OpenSource/SharpZipLib/Download.aspx) free library.
It lets you work with TGZ and has methods to test files before trying to inflate them; so you can either rely on the file extensions being correct, or test them individually to see if you can read them as compressed files - then inflate them once the main file has been decompressed.
To read and write .tar and .tgz (or .tar.gz ) files from .NET, you can use this one-file tar class:
http://cheesoexamples.codeplex.com/SourceControl/changeset/view/97756#1868643
Very simple usage. To create an archive:
string[] filenames = { ... };
Ionic.Tar.CreateArchive("archive.tar", filenames);
Create a compressed (gzip'd) tar archive:
string[] filenames = { ... };
Ionic.Tar.CreateArchive("archive.tgz", filenames, TarOptions.Compress);
Read a tar archive:
var entries = Ionic.Tar.List("archive.tar"); // also handles .tgz files
Extract all entries in a tar archive:
var entries = Ionic.Tar.Extract("archive.tar"); // also handles .tgz files
Take a look at DotNetZip on CodePlex.
"If all you want is a better
DeflateStream or GZipStream class to
replace the one that is built-into the
.NET BCL, that is here, too.
DotNetZip's DeflateStream and
GZipStream are available in a
standalone assembly, based on a .NET
port of Zlib. These streams support
compression levels and deliver much
better performance that the built-in
classes. There is also a ZlibStream to
complete the set (RFC 1950, 1951,
1952)."
It appears that you can iterate through the compressed file and pull the individual files out of the archive. You can then test the files you uncompressed and see if any of them are themselves GZip files.
Here is a snippit from their Examples Page
using (ZipFile zip = ZipFile.Read(ExistingZipFile))
{
foreach (ZipEntry e in zip)
{
e.Extract(OutputStream);
}
}
Keith

Figure out how it is compressed?

I have a archive (I know its a game compression) and I am trying to figure out how it is compressed so I can add files to it using C#. It opens/works in 7zip, and winrar. But when I use ZipForge/ComponentAce archive reference it says Invalid File.
Any help?
Steve
Have you opened the file up in a binary editor to see if the first few bytes denote the format?
For example ZIP files have the header format given on here http://www.ta7.de/txt/computer/computer016.htm
What extension does the file have?

Categories

Resources