Hello I am trying to compress a file using GZipStream.
I have created my own extension, let's call it .myextension
I try to compress .myextension and keep its extension. I mean that I am trying to compress a .myextension to the same extension. Example: I have myfile.myextension and
I want to compress it to myfile.myextension. It works. I can compress my file really well.
The problem is that when I try to decompress it using GZipStream it says that the magic number is incorrect.
How can I fix that? When decompressing should I just change the extension to .gz? Should I convert it somehow? Please help me I have no idea how to continue.
This is a common question. I would like to provide you the similar threads with the solutions:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=427166&SiteID=1
A 'Magic Number' is usually a fixed value, and often appearing somewhat arbitrary, possibly indecipherable. For example, a line of code may have:
If X = 31 then
'Do Something
End If
In this case, 31 is a 'Magic Number': It has no obvious meaning (and as far as coding is concerned, a term of derision).
Files (of different types) often have the first few bytes set to certain values, for example, a file which has the first two bytes as then hexadecimal numbers 42 4D is a Bitmap file. These numbers are 'magic numbers' (In this case, 42 4D corresponds to the characters BM). Other files have similar 'magic numbers'.
http://forums.microsoft.com/msdn/showpost.aspx?postid=1154042&siteid=1
Of course, the minute someone (team) develops a no-fuss compression/decompression custom task which supports zip,bzip2, gzip, rar, cab, jar, data and iso files, I'll use that, until that time, I'll stick with the open-source command-line utilities.
Of course, you can code up a solution, but this one is such low hanging fruit. For handling zip files, there is no native .NET library (at least not yet). Now there is support is for handling the compressed streams INSIDE the zip file, but not navigating the archive itself.
Now, as I mentioned in a previously, there are plenty of open-source zip utils like those on Sourceforge. These work fine on Win2003 Server x64, I can attest to that.
However, if you're insistent on a .NET solution for zip decompression, use http://www.icsharpcode.net/OpenSource/SharpZipLib/, which is open source, and which has a clean and reliable 100% .NET implementation.
First off, from other users who have had various issues, GZipStream should not be used since it has bugs. It does not compress short strings correctly and it does not detect corrupted compressed data. It is a very poor implementation.
As for your problem, others using GZipStream see a four-byte prefix to the gzip data which is the number of uncompressed bytes. If that is written to the file, that would cause the problem you are seeing. The gzip file should start with the hex bytes 1f 8b.
Related
I am facing the issue where I need to convert a video(in .mp4 for example) to .bin so it can be read by one of these infamous 3D Holographic Fan. At the moment I am doing it this way using C#.
private async Task<bool> convertToBin(string file)
{
byte[] bytes = System.IO.File.ReadAllBytes(file);
string path = Path.GetFullPath(file) + ".bin";
string str = System.Text.Encoding.UTF8.GetString(bytes);
System.IO.File.WriteAllText(path, str);
return true;
}
However, the produced .bin is recognized by the fan but when played the LEDs all turn white. Furthermore if I open a .bin generated with the fan's software the format seems completely different, as the first 8.000 lines of the correct .bin are just 0000 0000 0000 0000.
Any idea how to accomplish this?
This is a pretty broad question on file conversion since technically you can just change the extension on any file to .bin and it is a valid .bin file.
The reason being that the .bin extension has no specific standard, its just a collection of binary data - this means that different companies can (if they wish) implement their own standards within files that they work with.
In terms of holo fans, most manufacturers of them will often either have for free (or a small fee) a video conversion piece of software available that converts a file into a .bin that will work with the fan for you. (also many fans now can just work with .mp4 etc too, but im guessing yours cant)
If the first X amount of data in the correct file really is just a stream of 0's it seems as if there is some amount of "padding" at the beginning of the file though without really being able to see the file not 100% on that.
Either way, generic conversion to a .bin without knowing the specific format that the device / manufacturer is potentially enforcing is pretty hard - like trying to get the exact amount of water to fill a bucket, without ever seeing the bucket
Binary itself is meaningless, until such time as an executed algorithm defines what should be done with each bit, byte, word or block. Thus, just examining the binary and attempting to match it against known formats can lead to the wrong conclusion as to what it actually represents.
Quote from the Wiki page on Binary files :)
For example, I recorded a video using my camera and saved it as my_vacation.mp4 which size is 50MB. I opened the video file and an encrypted file called secret_message.dat using Visual Studio, by using File.ReadAllBytes() in C#, concatenated both arrays of bytes, and then saved it as my_vacation_2.mp4.
The program I created for testing purpose is able to save the byte index where the hidden file begin and I want to use it as key to extract that hidden file later.
Now I can play that video file normally, without any error. Total file size is 65MB. Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
What might be the flaw of this technique? Is this also a valid steganography technique?
Is this a valid steganography technique?
Yes, it is. The definition of steganography is hiding information in another medium without someone suspecting its presence or existence. Just because it may be a bad approach doesn't change its intentions at all. If anything, a multitude of papers on steganography mention this technique in their introduction section as an example of how steganography can be applied.
What might be the flaw of this technique?
There are mainly 2 flaws: it is trivial to detect and is absolutely fragile to modification attacks.
Many formats encode their data either by a header which says in advance how many bytes to read before the end of file, or by putting an end-of-file marker, which means to keep on reading data until the marker is encountered. By attaching your data after that, you ensure they won't be read by the appropriate format decoder. This can fool your 11-year old cousin who knows nothing about that sort of stuff, but anyone mildly experienced can load the file and count how many bytes were read. If there are unaccounted bytes in the physical file, that will instantly raise red flags.
Even worse, it's trivial to fully extract your secret. You may argue it's encrypted, but remember, the aim of steganography is to not raise any suspicion. Most steganalysis approaches put a statistical number to it, e.g., 60% there is a message hidden in X medium. A few others can go a bit further and guess the approximate length of the embedded secret. In comparison, you're already caught red-handed.
Talking about length, a file of X bitrate/compression and Y duration approximately results to a file of size Z. Even an unsavvy one will know what's up when the size is 30% larger than expected.
Now, imagine your file is communicated through an insecure channel where a warden inspects its contents and if he suspects foul play, he can modify the file so that the recipient doesn't get the message. In this case, it's as simple as loading the file and resaving it. In fact, your method is so fragile it can be destroyed by even the most unintentional of attacks. By just uploading your track to a site for playback, it can unwittingly reencode it for higher compression, just because it makes sense.
Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
No. Your secret file is encrypted, so that probably rules out any headers showing up in hex editor, but there is a problem - MP4 container format and its structure is well known.
You can extract all video/audio tracks and what you are left with is some metadata and your secret message, so it will be obvious that it's not supposed to be there.
It is a valid technique, just not a very effective one.
In C#, I have a ZIP file that I want to corrupt by XORing or Nulling its bytes.
(by Nulling I mean make all the bytes in the file zeros)
XORing its bytes requires me to first, read the bytes to a byte array, XOR the bytes in the array with some value, then write the bytes back to the file.
Now, if I XOR/Null All (or half) of the file's bytes, it gets corrupted, but if Just
XOR/Null some of the bytes, say the first few bytes (or any few number of bytes in any position of the file) it doesn't get corrupted, and by that I mean that i can still access the file as if nothing really happend.
Same thing happened with mp3 files.
Why isn't the file getting corrupted ?
and is there a "FAST" way that i could corrupt a file with ?
the problem is that the zip file that I'm dealing with is big,
so XORing/Nulling even half of its bytes will take a couple of secs.
Thank You So Much In Advance .. :)
Just read all files completely and you probaly will get reading errors.
But of course, if you want to keep something 'secret', use encryption.
A zip contains a small header, a directory structure (a the end) and in between the individual files. See Wikipedia for details.
Corrupting the first bytes is sure to corrupt the file but it is also very easily repaired. The reader won't be able to find the directory block at the end.
Damaging the last block has the same effect: the reader will give up immediately but it is repairable.
Changing a byte in the middle will corrupt 1 file. The CRC will fail.
It depends on the file format you are trying to "corrupt". It also depends on what portion of the file you are trying to modify. Lastly, it depends how you are verifying if it is corrupted. Most file formats have some type of error detection.
The other thing working against you is that the zip file format uses a CRC algorithm for corruption. In addition, there are two copies of the directory structure, so you need to corrupt both.
I would suggest you corrupt the directory structure at the end and then modify some of the bytes in the front.
I could just lock the zip entries with a pass, but I don't want anybody to even open it up and see what's in it
That makes it sound as if you're looking for a method of secure deletion. If you simply didn't want someone to read the file, delete it. Otherwise, unless you do something extreme like go over it a dozen times with different values or apply some complex algorithm over it a hundred times, there are still going to be ways to read the data, even if the format is 'corrupt'.
On the other hand, breaking a file simply to stop someone else accessing it conventionally just seems overkill. If it's a zip, you can read it in (there are plenty of questions here for handling archive files), encrypt it with a password and then write it back out. If it's a different type of file, there are literally a million different questions and solutions for encrypting, hiding or otherwise preventing access to data. Breaking a file isn't something you should being going out of your way to do, unless this is to help test some sort of un-zip-corrputing-program or something similar, but your comments imply this is to prevent access. Perhaps a bit more background on why you want to do this could help us provide a better answer?
I've been searching & googling a lot about this question, and I already know how to compare two files (hashes, checksums, etc.). But it's not quite what I need. What I need is described below.
Lets assume I have a file and I've backuped it. Later I've made some changes to this file, so I want to apply changes to the backup version. Since two files can be big enought and changes can be small, I don't want to rewrite all the file, because I'm planning to backup it though the internet (maybe FTP) wich can take a lot of time.
How I see this (sample):
Backup version of file (bytes)
134 253 637 151
Newer version of file (bytes)
134 624 151 890
Instead of rewriting all bytes, we should:
change 253 to 624 (change bytes)
remove 637 bytes (remove bytes)
write 890 at the end of file (insert bytes)
The 1,2,3 options do not necessarily appear at once in each case.
Note that the backup file could be located somewhere else, and I only have acces to it through the internet (server could return something so we can compare files).
How can I achive this? I know it's possible cause I know software where it's implemented (but couldn't find out how).
Any hints, tutorials, etc. is welcomed and highly appriciated.
Thanks in advance.
You're trying to solve the same problem that every MMORPG has solved... creating and applying small patch files to update older versions of large binaries.
This is a well-studied problem and there are a number of solutions out there. For several existing options, see
Binary patch-generation in C#
I have some JPEG files that I can't seem to load into my C# application. They load fine into other applications, like the GIMP. This is the line of code I'm using to load the image:
System.Drawing.Image img = System.Drawing.Image.FromFile(#"C:\Image.jpg");
The exception I get is: "A generic error occurred in GDI+.", which really isn't very helpful. Has anyone else run into this, or know a way around it?
Note: If you would like to test the problem you can download a test image that doesn't work in C#.
There's an exact answer to this problem. We ran into this at work today, and I was able to prove conclusively what's going on here.
The JPEG standard defines a metadata format, a file that consists of a series of "chunks" of data (which they call "segments"). Each chunk starts with FF marker, followed by another marker byte to identify what kind of chunk it is, followed by a pair of bytes that describe the length of the chunk (a 16-bit little-endian value). Some chunks (like FFD8, "Start of Image") are critical to the file's usage, and some (like FFFE, "Comment") are utterly meaningless.
When the JPEG standard was defined, they also included the so-called "APP markers" --- types FFE0 through FFEF --- that were supposed to be used for "application-specific data." These are abused in various ways by various programs, but for the most part, they're meaningless, and can be safely ignored, with the exception of APP0 (FFE0), which is used for JFIF data: JFIF extends the JPEG standard slightly to include additional useful information like the DPI of the image.
The problem with your image is that it contains an FFE1 marker, with a size-zero chunk following that marker. It's otherwise unremarkable image data (a remarkable image, but unremarkable data) save for that weird little useless APP1 chunk. GDI+ is wrongly attempting to interpret that APP1 chunk, probably attempting to decode it as EXIF data, and it's blowing up. (My guess is that GDI+ is dying because it's attempting to actually process a size-zero array.) GDI+, if it was written correctly, would ignore any APPn chunks that it doesn't understand, but instead, it tries to make sense of data that is by definition nonstandard, and it bursts into flames.
So the solution is to write a little routine that will read your file into memory, strip out the unneeded APPn chunks (markers FFE1 through FFEF), and then feed the resulting "clean" image data into GDI+, which it will then process correctly.
We currently have a contest underway here at work to see who can write the JPEG-cleaning routine the fastest, with fun prizes :-)
For the naysayers: That image is not "slightly nonstandard." The image uses APP1 for its own purposes, and GDI+ is very wrong to try to process that data. Other applications have no trouble reading the image because they rightly ignore the APP chunks like they're supposed to.
.Net isn't handling the format of that particular image, potentially because the jpeg data format is slightly broken or non-standard. If you load the image into GIMP and save to a new file you can then load it with the Image class. Presumably GIMP is a bit more forgiving of file format problems.
This thread from MSDN Forums may be useful.
The error may mean the data is corrupt
or there is some underlying stream
that has been close too early.
The error can be a permission problem. Especially if your application is an ASP.NET application. Try moving the file to the same directory as your executable (if Win forms) or the root directory of your web application (if asp.net).
I have the same problem.
The only difference i've noticed is the compression. It works fine with "JPEG" but when the compression is "Progressive JPEG" i get the exception (A generic error occurred in GDI+).
At first i thought it could be a memory problem because the images i mentioned were kind of big (about 5MB in disk and maybe ~80MB in memory), but then i'v noticed the difference in the compression type.
When i open/save the image file in other program like IrfanView or GIMP, the result is ok, but that's not the idea.