Detect file extension c#

Detect file extension c# - c#

There is a virus that my brother got in his computer and what that virus did was to rename almost all files in his computer. It changed the file extensions as well. so a file that might have been named picture.jpg was renamed to kjfks.doc for example.
so what I have done in order to solve this problem is:
remove all file extensions from files. (I use a recursive method to search for all files in a directory and as I go through the files I remove the extension)
now the files do not have an extension. the files now look like:
I think this file names are stored in a local database created by the virus and if I purchase the anti virus they will be renamed back to their original name.
since my brother created a backup I selected the files that had a creation date latter than when my brother performed the backup. so I have placed that files in a directory.
I am not interested in getting the right extension as long as I can see the content of the file. for example, I will scan each file and if it has text inside I know it will have a .txt extension. maybe it was a .html or .css extension I will not be able to know that I know.
I belive that all pdf files should have something in common. or doc files should also have something in common. How can I figure what the most common types (pdf, doc, docx, png, jpg, etc) files have in common)
Edit:
I know it will probably take less time to go over all this 200 files and test each one instead of creating this program. it is just that I am curios to see if it will be possible to get the file extension.

In unix, you can use file to determine the type of file. There is also a port for windows and you can obviously write a script (batch, powershell, etc.) or C# program to automate this.

First, congratulate your brother on doing a backup. Many people don't, and are absolutely wiped out by these problems.
You're going to have to do a lot of research, I'm afraid, but you're on the right track.
Open each file with a TextReader or a BinaryReader and examine the headers. Most of them are detectable.
For instance: Every PDF starts with "%PDF-" and then its version number. Just look at those first 5 characters. If it's "%PDF-", then put a PDF on the filename and move on.
Similarly: "ÿØÿà..JFIF" for JPEG's, "[InternetShortcut]" for URL shortcuts, "L...........À......Fƒ" for regular shortcuts (the "." is a zero/null, BTW)
ZIPs / Compressed directories start with {0x50}{0x4B]{0x03}{0x04}{0x14}, and you should be aware that Office 2007/2010 documents are really ZIPs with XML files inside of them.
You'll have to do some digging as you find each type, but you should be able to write something to establish most of the file types.
You'll have to write some recursion to work through directories, but you can eliminate any file with no extension.
BTW - A great tool to help pwith this is HxD: http://www.mh-nexus.de/ It's what I used to pull this answer together!
Good luck!

"most common types" each have it's own format and most of them have some magic bytes at the fixed position near beginning of the file. You can detect most of formats quite easily. Even HTML, XML, .CSS and similar text files can be detected by analyzing their beginning. But it will take some time to write an application that will guess the format. For some types (such as ODF format or JAR format, which are built on top of regular ZIPs) you will be also able to detect this format.
But ... Can it be that there exists such application on the market? I guess you can find something if you search, cause the task is not as tricky as it initially seems to be.

Related

Best way to package large numbers of large files for download?

I have an ASP.NET website that stores large numbers of files such as videos. I want an easy way to allow the user to download all the files in a single package. I was thinking about creating ZIP files dynamically.
All the examples I have seen involve creating the file before it is downloaded but potentially terabytes of information will be downloaded and therefor the user will have a long wait. Apparently ZIP files store all the information regarding what is in the ZIP file at the end of the file.
My idea is to dynamically create the file as its downloaded. This way I could allow the user to click download. The download would start and not require any space on the server to be pre packaged as it would copy things over uncompressed sequentially. The final part of the file would contain the information on the contents of what has been downloaded.
Has anyone had any experience of this? Does anyone know a better way of doing this? At the moment I cant see any pre made utilities for doing this but I believe it will work. If it doesn't exist then i'm thinking that I will have to read the Zip file format specifications and write my own code... something that will take more time than I was intending to spend on this.
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

C# re-zipping Docx after image replace won't open [duplicate]

I have been trying to write a simple Markdown -> docx parser/writer, but am completely stuck with the last part, which should be the easiest: i.e. compressing the folder into a .docx that Word, or any other .docx reader, will recognize.
My parser-writer is irrelevant really: I have this problem if I simply unzip any old Word-produced *.docx and then try to recompress it with the usual compression utilities, giving it the file-ending docx. Is there some mysterious header I should be adding, or do I need a special OPC compression utility, or what?
I don't so much want a tool that will do this, as to figure out what is supposed to be there. It seems to be independent of the WordprocessingML specification.
Needless to say I don't know anything about compression. Everything I can find via Google has to do with fancy utilities you can use in business, but I'm making a little executable that would be GPLd or something, and should work on anything.

The most common problem around manually zipping together Open XML documents is that it will not work if you zip the directory instead of the contents. In other words, the[content_types].xml file, and the word, docProps, and _rels directories need to reside at the root level of the zip file.

Here are steps to unzip my.docx and re-zip:
% mkdir unzipped
% cd unzipped/
% unzip ../my.docx
% zip -r ../rezipped.docx *
% open ../rezipped.docx

The compression algorithm used is "Zip" (Base 64) compression.
7zip seems to offer this, though i have no tested it.

Further to what Mica said, the contents of the ZIP file are organised according to the Open Packaging Convention; cf. Microsoft's Essentials of the Open Packaging Convention.
You can use the .NET System.IO.Packaging to make and manipulate .docx files; this class is implemented in the Mono project.

Is there a reliable way to determine a text file?

I am in the process of making a web application. It allows you to upload a .txt or .log file (IIS Logs for example).
The current way I am checking if it is a .txt or .log is checking the file extension. Now I don't like this as it allows anyone to change virus.exe to virus.txt and it will upload.
How can I verify if it really is a text file?
I am sure this is a common problem, but I can't seem to find any good solutions.

As far as I know there is no perfect solution to this.
You can read a portion of bytes from the file and make an educated guess of the file type from that. Try reading through the answers from this SO post :
Using .NET, how can you find the mime type of a file based on the file signature not the extension

Alternatives to ZIP for combining many files into one on Windows using .NET

Im looking for methods to combine files including their name and relative path into one single file. A folder disguised as a file. I don't need any compression or encryption. Just the file data including some binary metadata attached to each file.
It would be great if this file was possible to open/inspect/unpack with a standard file browser in Windows such as with regular zip-files.
Yes I could use zip. But I'm researching alternatives and I would prefer a simple method I could implement myself in C#/.NET.
UPDATE
I've researched this some more and came across Microsoft's Structured Storage format. It looked promising at first but it seemes to be an obsolete format, replaced with the Open Package Format. And then I found out about the TAR-format. It seemes to be the most basic format. But I'm not sure yet if I can add any custom metadata to the entries with TAR.
UPDATE
I went with DotNetZip at the end anyway...

Why not use zip? You can use a third party library, like dotnetzip, to make the code easy to write. And, as you mentioned, Windows handles zip files well.

If you have specific reason to search an alternative to ZIP, take a look on virtual file systems, eg. CodeBase File System or our Solid File System. Solid File System lets you add alternate data streams (like in NTFS) or tags (small chunks of binary or text data) to each file or directory. And with OS edition of SolFS you can make the filesystem visible to Windows (including Explorer and third-party applications).
I must admit that while virtual file systems are easy to use (easier than ZIP), they are commercial products (I didn't see free virtual file system implementations yet).

C# - Reading free space bits on harddrive [duplicate]

I'm trying to find some lost .jpg pictures. Here's a .bat file to setup a simplified version of my situation
md TestSetup
cd TestSetup
md a
cd a
echo "Can we find this later?" > a.abc
del a.abc
cd..
rd a
What code would be needed to open the text file again? I'm actually looking for .jpeg files that were treated in a similar manner
More details: I'm trying to recover picture files from a previous one-touch backup where the directories and files have been deleted and everything was saved in the backup with a single character name and every file has the same 3 letter extension. There is a current backup but they need to view the previous deleted ones (or at least the .jpg files).
Here's how I was trying to approach it: C# code

To the best of my knowledge, most file recovery tools actually read the low-level filesystem format on the disk and try to piece together deleted files. This works because, at least in FAT, a deleted file still resides in the sector specifying the directory (just with a different first character to identify it as "deleted"). New files may overwrite these deleted entries and therefore make the file unrecoverable. That's just a little bit of theory.
There is a current backup but they
need to view the previous deleted ones
(or at least the .jpg files).
Unless there's a backup for that file at the time that you want to restore from, I believe you're going to have a hard time getting that file without resorting to a low-level filesystem read. And even then, you may be out of luck if enough revisions have been made (or it's not a trivial filesystem like FAT).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.