I have a large raw data file (up to 1GB) which contains raw samples from a USB data logger.
I need to store extra information relating to the file (sample rate, description, trigger point, last seek position etc) and was looking into adding this as a some sort of header.
The header file should ideally be human readable and flexible so I've so far ruled out some sort of binary serialization into a header.
I also want to avoid two separate files as they could end up separated when copied or backed up. I remembered somebody telling me that newer *.*x Microsoft Office documents are actually a number of files in a zip. Is there a simple way to achieve this? Could I still keep the quick seek times to the raw file?
Update
I started using the binary serializer and found it to be a pain. I ended up using the xml serializer as I'm more comfortable using it.
I reserve some space at the start of the files for the xml. Simple
When you say you want to make the header human readable, this suggests opening the file in a text editor. Do you really want to do this considering the file size and (I'm assuming), the remainder of the file being non-human readable binary data? If it is, just write the text header data to the start of the binary file - it will be visible when the file is opened but, of course, the remainder of the file will look like garbage.
You could create an uncompressed ZIP archive, which may allow you to seek directly to the binary data. See this for information on creating a ZIP archive: http://weblogs.asp.net/jgalloway/archive/2007/10/25/creating-zip-archives-in-net-without-an-external-library-like-sharpziplib.aspx
Related
For example, I recorded a video using my camera and saved it as my_vacation.mp4 which size is 50MB. I opened the video file and an encrypted file called secret_message.dat using Visual Studio, by using File.ReadAllBytes() in C#, concatenated both arrays of bytes, and then saved it as my_vacation_2.mp4.
The program I created for testing purpose is able to save the byte index where the hidden file begin and I want to use it as key to extract that hidden file later.
Now I can play that video file normally, without any error. Total file size is 65MB. Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
What might be the flaw of this technique? Is this also a valid steganography technique?
Is this a valid steganography technique?
Yes, it is. The definition of steganography is hiding information in another medium without someone suspecting its presence or existence. Just because it may be a bad approach doesn't change its intentions at all. If anything, a multitude of papers on steganography mention this technique in their introduction section as an example of how steganography can be applied.
What might be the flaw of this technique?
There are mainly 2 flaws: it is trivial to detect and is absolutely fragile to modification attacks.
Many formats encode their data either by a header which says in advance how many bytes to read before the end of file, or by putting an end-of-file marker, which means to keep on reading data until the marker is encountered. By attaching your data after that, you ensure they won't be read by the appropriate format decoder. This can fool your 11-year old cousin who knows nothing about that sort of stuff, but anyone mildly experienced can load the file and count how many bytes were read. If there are unaccounted bytes in the physical file, that will instantly raise red flags.
Even worse, it's trivial to fully extract your secret. You may argue it's encrypted, but remember, the aim of steganography is to not raise any suspicion. Most steganalysis approaches put a statistical number to it, e.g., 60% there is a message hidden in X medium. A few others can go a bit further and guess the approximate length of the embedded secret. In comparison, you're already caught red-handed.
Talking about length, a file of X bitrate/compression and Y duration approximately results to a file of size Z. Even an unsavvy one will know what's up when the size is 30% larger than expected.
Now, imagine your file is communicated through an insecure channel where a warden inspects its contents and if he suspects foul play, he can modify the file so that the recipient doesn't get the message. In this case, it's as simple as loading the file and resaving it. In fact, your method is so fragile it can be destroyed by even the most unintentional of attacks. By just uploading your track to a site for playback, it can unwittingly reencode it for higher compression, just because it makes sense.
Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
No. Your secret file is encrypted, so that probably rules out any headers showing up in hex editor, but there is a problem - MP4 container format and its structure is well known.
You can extract all video/audio tracks and what you are left with is some metadata and your secret message, so it will be obvious that it's not supposed to be there.
It is a valid technique, just not a very effective one.
I'm looking to make my own file format .
that format should contains pictures/pdf/and other files ...
also need to know how I can packer-unpacker for this format to unpack files from it/pack in it & reading the pictures from my own format to picture boxes on my WinForm for example.
I've searched but didn't really found what I'am looking for
I hope someone can help me , thank you
Zip is an excellent choice. Because you can encrypt the file and of course reduce the file size in some cases (text and uncompressed things). But if you want to create your own file format you can easily decide rules for your storage and order inside the file. Then serialize the info into the file. For example by object serialization or by writing the binary date to file object by object .
if you really want to write your own file format then I would suggest one of two things. One, you could do it entirely in binary at which point you would want to do a 'chunk' format. Chunk format is to basically have a header to each subsection. The header contains the size of both the header as well as the size of the payload. Create a serialization class for your header then add the bytes to the filestream from your payload. Actually pretty easy to do.
Second (and easier) way to do this would be to create an XML format. Create a master class for your format then add all of the data as collections of sub classes under that. Once you have that, use any of .net xml serialization classes to serialize it out to disk.
You can also use SQLite for your purposes. It provides dbms power without needing server. That is popular solution for your problem.
System.Data.SQLite is an ADO.NET adapter for SQLite.
In C#, I have a ZIP file that I want to corrupt by XORing or Nulling its bytes.
(by Nulling I mean make all the bytes in the file zeros)
XORing its bytes requires me to first, read the bytes to a byte array, XOR the bytes in the array with some value, then write the bytes back to the file.
Now, if I XOR/Null All (or half) of the file's bytes, it gets corrupted, but if Just
XOR/Null some of the bytes, say the first few bytes (or any few number of bytes in any position of the file) it doesn't get corrupted, and by that I mean that i can still access the file as if nothing really happend.
Same thing happened with mp3 files.
Why isn't the file getting corrupted ?
and is there a "FAST" way that i could corrupt a file with ?
the problem is that the zip file that I'm dealing with is big,
so XORing/Nulling even half of its bytes will take a couple of secs.
Thank You So Much In Advance .. :)
Just read all files completely and you probaly will get reading errors.
But of course, if you want to keep something 'secret', use encryption.
A zip contains a small header, a directory structure (a the end) and in between the individual files. See Wikipedia for details.
Corrupting the first bytes is sure to corrupt the file but it is also very easily repaired. The reader won't be able to find the directory block at the end.
Damaging the last block has the same effect: the reader will give up immediately but it is repairable.
Changing a byte in the middle will corrupt 1 file. The CRC will fail.
It depends on the file format you are trying to "corrupt". It also depends on what portion of the file you are trying to modify. Lastly, it depends how you are verifying if it is corrupted. Most file formats have some type of error detection.
The other thing working against you is that the zip file format uses a CRC algorithm for corruption. In addition, there are two copies of the directory structure, so you need to corrupt both.
I would suggest you corrupt the directory structure at the end and then modify some of the bytes in the front.
I could just lock the zip entries with a pass, but I don't want anybody to even open it up and see what's in it
That makes it sound as if you're looking for a method of secure deletion. If you simply didn't want someone to read the file, delete it. Otherwise, unless you do something extreme like go over it a dozen times with different values or apply some complex algorithm over it a hundred times, there are still going to be ways to read the data, even if the format is 'corrupt'.
On the other hand, breaking a file simply to stop someone else accessing it conventionally just seems overkill. If it's a zip, you can read it in (there are plenty of questions here for handling archive files), encrypt it with a password and then write it back out. If it's a different type of file, there are literally a million different questions and solutions for encrypting, hiding or otherwise preventing access to data. Breaking a file isn't something you should being going out of your way to do, unless this is to help test some sort of un-zip-corrputing-program or something similar, but your comments imply this is to prevent access. Perhaps a bit more background on why you want to do this could help us provide a better answer?
Is it possible to read the contents of a .ZIP file without fully downloading it?
I'm building a crawler and I'd rather not have to download every zip file just to index their contents.
Thanks;
The tricky part is in identifying the start of the central directory, which occurs at the end of the file. Since each entry is the same fixed size, you can do a kind of binary search starting from the end of the file. The binary search is trying to guess how many entries are in the central directory. Start with some reasonable value, N, and retrieve that portion of the file at end-(N*sizeof(DirectoryEntry)). If that file position does not start with the central directory entry signature, then N is too large - half and repeat, otherwise, N is too small, double and repeat. Like binary search, the process maintains the current upper and lower bound. When the two become equal, you've found the value for N, the number of entries.
The number of times you hit the webserver, is at most 16, since there can be no more than 64K entries.
Whether this is more efficient than downloading the whole file depends on the file size. You might request the size of the resource before downloading, and if it's smaller than a given threshold, download the entire resource. For large resources, requesting multiple offsets will be quicker, and overall less taxing for the webserver, if the threshold is set high.
HTTP/1.1 allows ranges of a resource to be downloaded. For HTTP/1.0 you have no choice but to download the whole file.
the format suggests that the key piece of information about what's in the file resides at the end of it. Entries are then specified as an offset from that particular entry, so you'll need to have access to the whole thing I believe.
GZip formats are able to be read as a stream I believe.
I don't know if this helps, as I'm not a programmer. But in Outlook you can preview zip files and see the actual content, not just the file directory (if they are previewable documents like a pdf).
There is a solution implemented in ArchView
"ArchView can open archive file online without downloading the whole archive."
https://addons.mozilla.org/en-US/firefox/addon/5028/
Inside the archview-0.7.1.xpi in the file "archview.js" you can look at their javascript approach.
It's possible. All you need is server that allows to read bytes in ranges, fetch end recored (to know size of CD), fetch central directory (to know where file starts and ends) and then fetch proper bytes and handle them.
Here is implementation in pyhon: onlinezip
[full disclosure: I'm author of library]
In my web application I am working with files. Some files are very large. I use Response.Write() to write the file to the browser. This goes well for the smaller files, but for large files this can take a while and the bandwidth is fully used.
Is it possible to split large documents and send it piece by piece to the browser? Are there other ways to send the document quicker to the browser?
I hold the document as a property of an object.
Why don't you compress the file and store it in the DB and decompress it will extracting it?
You can do a lot of things depending on this questions:
How often does the file change?
Do I really need the files in the DB?
Why not store the File path in the
DB and the File on disk?
Anyhow, since your files are extremely high bandwidth and you would want your app to respond appropriately you might want to use AJAX load the files Asynchronously. You can have a WebHandler .ashx for this.
Here's a few examples:
http://www.dotnetcurry.com/ShowArticle.aspx?ID=193&AspxAutoDetectCookieSupport=1
http://www.viawindowslive.com/Articles/VirtualEarth/InvokingserversidecodeusingAJAX.aspx
My question is, is it possible to
split large documents and send it
piece by piece to the browser?
It depends on the file type, but in general no. If you are sending something like an excel file or a word doc etc. the receiving application will need all of the information (bytes) to fully form the document. You could physically separate the document into multiple ones, and that would allow you to do so.
If the bandwidth is fully used, then there is nothing you can do to "speed it up" short of compressing the document prior to send. In other words, zip it up.
Depending on the document (I know you said .mht, but we're talking content here) you will see the size go down by some amount. Maybe it's enough, maybe not.
Either way, this is entirely a function of the amount of content you want to send versus the size of the pipe available to send it. One of those is more difficult to change than the other.
Try setting IIS's dynamic compression. By default, it's set fairly low, but you can try setting it for a higher compression level and see how much that helps.
I'm not up to speed with ASP.NET but you might be able to buffer from a FileStream to some sort of output stream.
You can use the Flush method to send the currently buffered data to the client (the browser).
Note that this has some implications, as is described aptly here.
I've considered using it myself, a project sent documents that became fairly large and I was cautious about storing the whole data in memory. In the end I decided the data was not large enough to be a problem though.
Sadly the MSDN documentation is very, very vague on what Flush implies and you will probably have to use Google to troubleshoot.