In my application I receive different files in base64string.
After receiving those base64Strings my application needs to convert them
into their original formats.
These files could be pdf,txt,jpeg image,bmp image,gif image or png image formats.
How do I know what format this file is in order to convert them to their
respective formats. Is there any way the base64string gives this info.
Any help will be appreciated.'
The base64 data only contains the file data itself, no metadata about it (including file name / extension). You could potentially try to parse the first few bytes of the decoded base64 data to try to find out the file type, but an easier approach would be for the service to add this information in some HTTP header (such as Content-Disposition).
I think you only need to convert it to binary format from base64string and save on disk. You only need to get the correct file extension or complete file name so that user can use associated program to open it.
The only reliable means to get the file type is through metadata associated with the file. If this is not available in your case, a workaround is to read the first few bytes of the file. Many common formats require that files of that format begin with a sequence of bytes, known as "magic numbers".
This Wikipedia article provides maic numbers for PDF, JPG, PNG, and GIF formats. BMP files typically begin with the constant 0x42 0x4D (*). Since text files contain only content, it would need to be a default option (i.e, if the first few bytes aren't recognized as a known magic number, assume it is a text file.)
The Base-64 encoding is simply the binary representation of the file. Converting back to a byte sequence and assessing the first few bytes should be sufficient to suggest a file is of a certain type. Note that this is an imperfect workaround; for instance, a text file that happens to start with a magic number (e.g., "BM") may be miscategorized as another type of file.
Related
I am trying to restore files which are stored in MS SQL database (used by third party application which has stopped their support) as an image data type(byte arrays). So what I do is to write those rows of byte arrays to file to convert for know file extensions. However some of them are not known which I believe they are compressed. Since I get "CC_Compress" string and random characters in the file after conversion. I was wondering if it is possible to find out which compression method were used and how can I decompress it before I convert them.
Following is first bytes from the byte array:
0x43435F434F4D50524553530000000000000000000000010004F60000E4780000EC7C075C54C7F3F85CA10A8A204544796001519A0D4569414414011115238A079C80C21D5204224D632C51C19268628FC6A851638B9A88882D9604C5D83BB688A002564085FBCFBEBDC71DCD16623EFFEF2FA373B33B5B667676B6BC7DCB3B9DDFF2E677DB8D6F411D70060154CB34401514C043D4E5223A00AD80F2AA65321961B54494FD07FF5FC1C3750741304E430850A19B0B2CE8B0BFD8F16DD5019A43C88490097FC4FD1107F5404368003D740036B6E6B1D816DE0C32598BB78639F89EFDDD20801AAA1C6E8CB60205E
and 43435F434F4D5052455353 is the part that converts to CC_Compress
Thanks in advance,
Raw deflate-compressed data begins 32 bytes in (starting with the ec 7c). You can use zlib to decompress it.
I plan to send and receive file with a microcontroller. I wrote up a simple protocol for both sender and receiver, but I have some trouble reconstructing the file back. I send the data in a stream of raw binary. However, I have not found the location of fileinfo (name, ext, size, etc.) in the file itself. Where is the fileinfo stored in the file? How does the OS know all these information if it isn't store in the file? (for e.g. name, extension, size, etc.)
Trivial question: Should I attach this file information with the protocol header? or should I just append it onto the file binary data?
You need to attach that information to your binary data yourself. If you have a binary stream, I suggest (it's easiest) you provide a fixed size header that contains all the file meta information. Then you append the file's content.
Why fixed size? Well, otherwise the receiver doesn't know where the file's content starts. You could also provide the header size in the first X bytes of the stream and then have a variable sized header. As you like it, but I prefer the fixed size solution.
Example for fixed size header:
<255 bytes file name><8 bytes file size><Content...>
Example for dynamically sized header:
<4 bytes length of file name><x bytes file name><8 bytes file size><Content...>
Let me stress that it is very important that you also transmit the size of the content in bytes, so that the receiver knows how many bytes to read! Packets may be fragmented, you know?
How does your self-made "protocol" work?
It is quite uncommon for files to store their own size, it is a responsibility of the underlying file system to keep track of that (name including extension, size, permissions, modification time, ...).
You can put the size information in the header, or if you are sure that a certain sequence of bytes is never sent as payload, you can use this as a termination sequence to tell the receiver to stop receiving.
When we convert an image to binary data, (let's say a .png image) is there a way to get the extension back while converting the binary to image again in .net?
Short answer, no. You can't get the name either. The file name is not generally stored in image data.
If you know what the image format is you can use either a sensible, generally recognised extension or a file extension registered to that file type on your system. Hopefully, these will not differ.
If you don't know the format perhaps you could read it before serialising to binary and prefix it to the representation.
For a less general answer please expand your question.
EDIT
I guess you could attempt to display the image using a set of potential formats, then visually assess all succesful decodes to choose the correct format. Somehow, it seems easier to just include the original extension in the binary serialization.
Well guys I am in a bit of a pickle here...
I am doing some exercises on encrypting data. One of them are binary files. I am currently using triple DES to encrypt and decrypt the files both in VB.NET and C#...
Now the thing is, once it is decrypted in VB.NET and saved, i can execute it again...
But for some reason, my C# file is bigger! 20,4K where VB.NET one is 19,0. The C# file also is rendered unexecutable...
Upon a closer look. The files appear almost exactly the same, but C# seems to add in a few extra bytes here and there in (seemingly) random places...
I am currently using File.ReadAllText(String filepath, Encoding encoding); with UTF-8 encoding
thanks!
You say you're using File.ReadAllText... but also that these are binary files. That makes me suggest that you're treating opaque binary data (e.g. the result of encryption) as if it were text (e.g. calling Encoding.GetString on it).
Don't do that.
Basically, encryption generally works on binary data - binary in, binary out. If you need to encrypt text to text, you'll usually apply a "normal" encoding to convert the text to binary data (e.g. Encoding.UTF8.GetBytes(text)) and then use Base64 to convert the opaque binary data to text in a lossless way - e.g. with Convert.ToBase64String(encrypted).
Decrypting is just the reverse: use Convert.FromBase64String(encryptedText) to get the encrypted binary data, decrypt it, and then use Encoding.UTF8.GetString(decrypted) to get back to the text.
I have a large raw data file (up to 1GB) which contains raw samples from a USB data logger.
I need to store extra information relating to the file (sample rate, description, trigger point, last seek position etc) and was looking into adding this as a some sort of header.
The header file should ideally be human readable and flexible so I've so far ruled out some sort of binary serialization into a header.
I also want to avoid two separate files as they could end up separated when copied or backed up. I remembered somebody telling me that newer *.*x Microsoft Office documents are actually a number of files in a zip. Is there a simple way to achieve this? Could I still keep the quick seek times to the raw file?
Update
I started using the binary serializer and found it to be a pain. I ended up using the xml serializer as I'm more comfortable using it.
I reserve some space at the start of the files for the xml. Simple
When you say you want to make the header human readable, this suggests opening the file in a text editor. Do you really want to do this considering the file size and (I'm assuming), the remainder of the file being non-human readable binary data? If it is, just write the text header data to the start of the binary file - it will be visible when the file is opened but, of course, the remainder of the file will look like garbage.
You could create an uncompressed ZIP archive, which may allow you to seek directly to the binary data. See this for information on creating a ZIP archive: http://weblogs.asp.net/jgalloway/archive/2007/10/25/creating-zip-archives-in-net-without-an-external-library-like-sharpziplib.aspx