Is there a reliable way to determine a text file? - c#

I am in the process of making a web application. It allows you to upload a .txt or .log file (IIS Logs for example).
The current way I am checking if it is a .txt or .log is checking the file extension. Now I don't like this as it allows anyone to change virus.exe to virus.txt and it will upload.
How can I verify if it really is a text file?
I am sure this is a common problem, but I can't seem to find any good solutions.

As far as I know there is no perfect solution to this.
You can read a portion of bytes from the file and make an educated guess of the file type from that. Try reading through the answers from this SO post :
Using .NET, how can you find the mime type of a file based on the file signature not the extension

Related

Reading a file and search for user-inputted strings

My senior/friend asked me to do a tool that would simply check a SRecord file (.mot file) if it contains strings typed in by the user of the tool.
I was thinking if this solution would work when dealing with srecord files.
Before this, I just googled on how to read a file and got this as the number one hit. But I'm not sure if these links can be useful with what I'm supposed to do.
The strings would depend on the user as it doesn't have a clear format but from what I've been told, the address and the data of the record would be typed in and the program would check if the input exists in the .mot file
Will the links provided above work on reading the SRecord files? When I tried a file analysis tool, the .mot files is said to be of Text file type to maybe the solutions in the links are applicable?
I don't know how a SRecord file looks like, but if its an Textfile and you are looking for some string occurencies you have already found some usefull references.
File.ReadAllText("C:\file.mot").Contains("someuserinput");
will give you a boolean if your search input is in the file or not.

C# - Checking whether byte array is a valid video file, and of what type

I am working on a system that saves temporary files in windows\temp. These files take on a .tmp file extension.
I am working on functionality that needs to read one of these files, identify whether it is an image or video file, and the filetype. Since the files are saved as .tmp, I can not use the file extension.
I've already written code that identifies whether the file is a valid image file, and it's filetype - This was actually quite easy, to my surprise!
My question is this: How can I identify whether an array of bytes is a valid video file, and if it is, how can I identify it's filetype?
As I understand, this is in general not an easy task as there are hundreds of formats. But I guess if you learn about binary signatures, or file signatures, you'll get a step forward with this question.
Here is an idea:
http://www.den4b.com/wiki/ReNamer:Binary_Signatures
And here more information:
http://en.wikipedia.org/wiki/List_of_file_signatures
Good luck :-)

How to reliably detect mime type of uploaded [text-based] file in asp.net?

My site allows for resume upload, but I want to make sure users won't be uploading anything else but plain text, rtf or word documents (both old *.doc and new *.docx formats). Obviously I can't go entirely by extension, I need to somehow detect file's mime type by its content. Any ideas how to reliably do that for the above types?
This is a duplicate of the Using .NET, how can you find the mime type of a file based on the file signature not the extension question here on stackoverflow. This one includes an answer with a code sample to use the FindMimeFromData method from urlmon.dll.
The browser will send you a mime type when the file is uploaded. While not 100% consistent or reliable, that might be your best bet.

Detect file extension c#

There is a virus that my brother got in his computer and what that virus did was to rename almost all files in his computer. It changed the file extensions as well. so a file that might have been named picture.jpg was renamed to kjfks.doc for example.
so what I have done in order to solve this problem is:
remove all file extensions from files. (I use a recursive method to search for all files in a directory and as I go through the files I remove the extension)
now the files do not have an extension. the files now look like:
I think this file names are stored in a local database created by the virus and if I purchase the anti virus they will be renamed back to their original name.
since my brother created a backup I selected the files that had a creation date latter than when my brother performed the backup. so I have placed that files in a directory.
I am not interested in getting the right extension as long as I can see the content of the file. for example, I will scan each file and if it has text inside I know it will have a .txt extension. maybe it was a .html or .css extension I will not be able to know that I know.
I belive that all pdf files should have something in common. or doc files should also have something in common. How can I figure what the most common types (pdf, doc, docx, png, jpg, etc) files have in common)
Edit:
I know it will probably take less time to go over all this 200 files and test each one instead of creating this program. it is just that I am curios to see if it will be possible to get the file extension.
In unix, you can use file to determine the type of file. There is also a port for windows and you can obviously write a script (batch, powershell, etc.) or C# program to automate this.
First, congratulate your brother on doing a backup. Many people don't, and are absolutely wiped out by these problems.
You're going to have to do a lot of research, I'm afraid, but you're on the right track.
Open each file with a TextReader or a BinaryReader and examine the headers. Most of them are detectable.
For instance: Every PDF starts with "%PDF-" and then its version number. Just look at those first 5 characters. If it's "%PDF-", then put a PDF on the filename and move on.
Similarly: "ÿØÿà..JFIF" for JPEG's, "[InternetShortcut]" for URL shortcuts, "L...........À......Fƒ" for regular shortcuts (the "." is a zero/null, BTW)
ZIPs / Compressed directories start with {0x50}{0x4B]{0x03}{0x04}{0x14}, and you should be aware that Office 2007/2010 documents are really ZIPs with XML files inside of them.
You'll have to do some digging as you find each type, but you should be able to write something to establish most of the file types.
You'll have to write some recursion to work through directories, but you can eliminate any file with no extension.
BTW - A great tool to help pwith this is HxD: http://www.mh-nexus.de/ It's what I used to pull this answer together!
Good luck!
"most common types" each have it's own format and most of them have some magic bytes at the fixed position near beginning of the file. You can detect most of formats quite easily. Even HTML, XML, .CSS and similar text files can be detected by analyzing their beginning. But it will take some time to write an application that will guess the format. For some types (such as ODF format or JAR format, which are built on top of regular ZIPs) you will be also able to detect this format.
But ... Can it be that there exists such application on the market? I guess you can find something if you search, cause the task is not as tricky as it initially seems to be.

Protecting a Xml file

I'm fairly new to coding, and I just got help figuring out how to create a Xml file; now I want to know, is there a way to protect my Xml file from being edited?
I'm making a simple Command Prompt game, and I'm going to include an Xml file for info storage purposes. Although I don't want the user to be able to change the info contained in the file.. Is there a way to achieve this? It doesn't need to be extensive at this time, due to the program only being a small project.
Anyway, I'm making the program with Visual Studio Pro 2010, and I'm coding it in C#.
Thank you, for any help in advance.
the standard way to verify that parts of your xml has not been modified is to use XML_Signature
this msdn example shows how this is done with dotnet4
I would embed your XML file as a resource of your console application's assembly. The XML file will exist as an embedded resource and not as a seperate file that the user could potentially change. If the user isn't meant to edit a configuration file, don't even let him see it, modify it, or delete it.
look at this topic decrypt and encrypt
i have created my own Encrypter class based from this classes. then you can create it for yourself for next use
You could simply compress it, if you don't need a high level of security. You could use a standard format (ZIP, CAB), or just deflate the stream and store it as a binary file. See the doc and examples about this here: DeflateStream Class
You can't prevent anyone from editing your xml file but you can encrypt your xml file to protect your data.

Categories

Resources