Where to find file information in a file?

Where to find file information in a file? - c#

I plan to send and receive file with a microcontroller. I wrote up a simple protocol for both sender and receiver, but I have some trouble reconstructing the file back. I send the data in a stream of raw binary. However, I have not found the location of fileinfo (name, ext, size, etc.) in the file itself. Where is the fileinfo stored in the file? How does the OS know all these information if it isn't store in the file? (for e.g. name, extension, size, etc.)
Trivial question: Should I attach this file information with the protocol header? or should I just append it onto the file binary data?

You need to attach that information to your binary data yourself. If you have a binary stream, I suggest (it's easiest) you provide a fixed size header that contains all the file meta information. Then you append the file's content.
Why fixed size? Well, otherwise the receiver doesn't know where the file's content starts. You could also provide the header size in the first X bytes of the stream and then have a variable sized header. As you like it, but I prefer the fixed size solution.
Example for fixed size header:
<255 bytes file name><8 bytes file size><Content...>
Example for dynamically sized header:
<4 bytes length of file name><x bytes file name><8 bytes file size><Content...>
Let me stress that it is very important that you also transmit the size of the content in bytes, so that the receiver knows how many bytes to read! Packets may be fragmented, you know?

How does your self-made "protocol" work?
It is quite uncommon for files to store their own size, it is a responsibility of the underlying file system to keep track of that (name including extension, size, permissions, modification time, ...).
You can put the size information in the header, or if you are sure that a certain sequence of bytes is never sent as payload, you can use this as a termination sequence to tell the receiver to stop receiving.

Related

Uploaded file's size mismatch

I'm a junior software developer trying to learn more about web development and for now I'm wondering about this thing.
I'm sending a file trough multiple SOAP web requests, in chunks of 100.000 bytes, the first chunk's request has "start" operation, the other ones have "next" and the last chunk is sent with an "end" operation in the request's envelope. In the image below you can see how the envelope look.
I cannot check the file's content myself but I can only see the uploaded file's SIZE.
The uploaded file's size is always a quarter of the actual file size that I've sent.
For example if I have a 300.000 bytes file to send, the uploaded file's size will be approximately of size 75.000 . A proper example can be seen below.
I'm not sure if I actually understand how this works or even if the file is uploaded correctly.
If someone could explain this to me I would greatly appreciate it. :D
Thank you a lot for your time!
PS: I tweaked with the chunkLength parameter ( from the envelope ) or the actual buffer size ( trying 1000 bytes per request or even 100.000 ), the result is the same. :)
PS2: The data is a random string that I first gzip compress it then send it trough the request mentioned before.

Copy byte array into clipboard

I'm trying to do a client-server program in which it is possible to share the content of the clipboard.
Right now I am able to share it if the content type is audio, image or text.
The idea is that I convert the content in a byte array, send it, convert it back in its original type (Stream, BitmapSource or string) and inject it in the client clipboard by using the methods Clipboard.SetAudio, Clipboard.SetImage or Clipboard.SetText.
My problem is when there are some files in the clipboard. I use the method Clipboard.GetFileDropList to get a list of the files, and for each file in the list I convert it in a byte array and send it to the client. How can I inject this byte array in the client clipboard?
I know there is the method Clipboard.SetFileDropList, but it requires me to provide a file list and since the file does not exist on the client I cannot use it.
How can I solve this problem?

In order to make the client treat the files as pastable, they'll need to exist on the client filesystem in some way, since the clipboard expects a list of filenames when setting clipboard content.
This can be done by transferring the data as a stream to your client, and then making the client immediately unpack that stream to a temp folder, the path to which is obtainable via:
var temp = Environment.ExpandEnvironmentVariables("%TEMP%");
Once done and the files are i place, you can position those files on the clipboard as if they were the ones copied.
Be warned that supporting file copy/paste instead of having an option to "transfer" files could run much slower than other operations, due to how big files can get.

Why does not changing a few number of bytes in a file corrupts the file?

In C#, I have a ZIP file that I want to corrupt by XORing or Nulling its bytes.
(by Nulling I mean make all the bytes in the file zeros)
XORing its bytes requires me to first, read the bytes to a byte array, XOR the bytes in the array with some value, then write the bytes back to the file.
Now, if I XOR/Null All (or half) of the file's bytes, it gets corrupted, but if Just
XOR/Null some of the bytes, say the first few bytes (or any few number of bytes in any position of the file) it doesn't get corrupted, and by that I mean that i can still access the file as if nothing really happend.
Same thing happened with mp3 files.
Why isn't the file getting corrupted ?
and is there a "FAST" way that i could corrupt a file with ?
the problem is that the zip file that I'm dealing with is big,
so XORing/Nulling even half of its bytes will take a couple of secs.
Thank You So Much In Advance .. :)

Just read all files completely and you probaly will get reading errors.
But of course, if you want to keep something 'secret', use encryption.
A zip contains a small header, a directory structure (a the end) and in between the individual files. See Wikipedia for details.
Corrupting the first bytes is sure to corrupt the file but it is also very easily repaired. The reader won't be able to find the directory block at the end.
Damaging the last block has the same effect: the reader will give up immediately but it is repairable.
Changing a byte in the middle will corrupt 1 file. The CRC will fail.

It depends on the file format you are trying to "corrupt". It also depends on what portion of the file you are trying to modify. Lastly, it depends how you are verifying if it is corrupted. Most file formats have some type of error detection.
The other thing working against you is that the zip file format uses a CRC algorithm for corruption. In addition, there are two copies of the directory structure, so you need to corrupt both.
I would suggest you corrupt the directory structure at the end and then modify some of the bytes in the front.

I could just lock the zip entries with a pass, but I don't want anybody to even open it up and see what's in it
That makes it sound as if you're looking for a method of secure deletion. If you simply didn't want someone to read the file, delete it. Otherwise, unless you do something extreme like go over it a dozen times with different values or apply some complex algorithm over it a hundred times, there are still going to be ways to read the data, even if the format is 'corrupt'.
On the other hand, breaking a file simply to stop someone else accessing it conventionally just seems overkill. If it's a zip, you can read it in (there are plenty of questions here for handling archive files), encrypt it with a password and then write it back out. If it's a different type of file, there are literally a million different questions and solutions for encrypting, hiding or otherwise preventing access to data. Breaking a file isn't something you should being going out of your way to do, unless this is to help test some sort of un-zip-corrputing-program or something similar, but your comments imply this is to prevent access. Perhaps a bit more background on why you want to do this could help us provide a better answer?

Saving different file formats

In my application I receive different files in base64string.
After receiving those base64Strings my application needs to convert them
into their original formats.
These files could be pdf,txt,jpeg image,bmp image,gif image or png image formats.
How do I know what format this file is in order to convert them to their
respective formats. Is there any way the base64string gives this info.
Any help will be appreciated.'

The base64 data only contains the file data itself, no metadata about it (including file name / extension). You could potentially try to parse the first few bytes of the decoded base64 data to try to find out the file type, but an easier approach would be for the service to add this information in some HTTP header (such as Content-Disposition).

I think you only need to convert it to binary format from base64string and save on disk. You only need to get the correct file extension or complete file name so that user can use associated program to open it.

The only reliable means to get the file type is through metadata associated with the file. If this is not available in your case, a workaround is to read the first few bytes of the file. Many common formats require that files of that format begin with a sequence of bytes, known as "magic numbers".
This Wikipedia article provides maic numbers for PDF, JPG, PNG, and GIF formats. BMP files typically begin with the constant 0x42 0x4D (*). Since text files contain only content, it would need to be a default option (i.e, if the first few bytes aren't recognized as a known magic number, assume it is a text file.)
The Base-64 encoding is simply the binary representation of the file. Converting back to a byte sequence and assessing the first few bytes should be sufficient to suggest a file is of a certain type. Note that this is an imperfect workaround; for instance, a text file that happens to start with a magic number (e.g., "BM") may be miscategorized as another type of file.

Options for header in raw byte file

I have a large raw data file (up to 1GB) which contains raw samples from a USB data logger.
I need to store extra information relating to the file (sample rate, description, trigger point, last seek position etc) and was looking into adding this as a some sort of header.
The header file should ideally be human readable and flexible so I've so far ruled out some sort of binary serialization into a header.
I also want to avoid two separate files as they could end up separated when copied or backed up. I remembered somebody telling me that newer *.*x Microsoft Office documents are actually a number of files in a zip. Is there a simple way to achieve this? Could I still keep the quick seek times to the raw file?
Update
I started using the binary serializer and found it to be a pain. I ended up using the xml serializer as I'm more comfortable using it.
I reserve some space at the start of the files for the xml. Simple

When you say you want to make the header human readable, this suggests opening the file in a text editor. Do you really want to do this considering the file size and (I'm assuming), the remainder of the file being non-human readable binary data? If it is, just write the text header data to the start of the binary file - it will be visible when the file is opened but, of course, the remainder of the file will look like garbage.
You could create an uncompressed ZIP archive, which may allow you to seek directly to the binary data. See this for information on creating a ZIP archive: http://weblogs.asp.net/jgalloway/archive/2007/10/25/creating-zip-archives-in-net-without-an-external-library-like-sharpziplib.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Where to find file information in a file? - c#

Related

Uploaded file's size mismatch

Copy byte array into clipboard

Why does not changing a few number of bytes in a file corrupts the file?

Saving different file formats

Options for header in raw byte file

Categories

Resources