I have already seen solution for hiding text files or messages within Image or audio files..
but i want solution for hiding text file within another text files (.txt, .doc, .pdf).
can somebody help for this??
Steganography is based on slightly changing data to "hide" some other set of data within these changes. That's why an image with steganography is slightly different than the original. You can't notice if if you don't know it's there, but the fact is you saved the data as changes within color information of pixels.
.txt file is nothing else than a big hunk of characters. If you tried to somehow change the data to hide something in it, it would result in unreadable text. If you change the color of a pixel from 215 Red to 217 Red, you won't really notice. But changing A to F or Ł is quite noticable.
So no, I don't believe it can be done. At least not with .txt files.
While I agree with #stonehead that at the end of the day if you put something in the file someone can find it, but there are a few tricks out their that may prove to be viable options.
Since most users are not living in their command prompt the most straight forward approach is to misrepresent the file to the GUI. This is a pretty handy trick for this.
http://www.howtogeek.com/howto/windows-vista/stupid-geek-tricks-hide-data-in-a-secret-text-file-compartment/
If you are storing data in a pdf you should have very little problems. I would use PdfClown. Not to get too into it but you will want to read up about the structure of a pdf. With clown pdf you could store an asset inside with no connection to the presentation layer. Given the complexity of pdf files i will almost bet no one will be looking in, i would base64encode the chunk to have it blend in with images and other data it would be difficult for someone to find it by just opening up the file.
Be For Warned ClownPDF C# library is not for the faint of heart and it will help to have some java experience because a lot of their docs are for java.
Hope these options help.
Related
Currently, I can extract all the text chunks with their location data from a PDF. The problem is that the PDF contains images with text annotations which I do not want including in the extraction.
However, for whatever reason whenever I search the PDF for images, it only finds 1 of the images and usually throws the exception: The colour space is not supported. It's as if it doesn't recognise them as images?
I am not wishing to extract the images, just locate where they start and end in relation to the PDF so I can exempt the text that is on top of the images.
For example:
Where the numbers on the graph are unwanted and need to be removed from the extracted text.
Im just not sure how to:
A) Locate all the images and store the coordinates of where it starts and ends
B) Ignore the text that is on top of the images in the PDF document
(I am using iTextSharp to try and achieve this, but so far I am not having much luck)
I'm not exactly sure how iTextSharp works but the PostScript language reference or the PDF Reference manuals may be a good place to start figuring out what you need to know.
I just cracked open a PDF file in a text editor to check out the format because I haven't seen it in a while and then realized what the problem might be.
PDFs support "Images", and "Stream Objects" which can contain image data. Stream objects actually declare enough information that you can know where they begin and end and write something to manually ignore them.
A Stream Object Header looks like this:
<</Intent/RelativeColorimetric/Subtype/Image/Length 19678/Filter/DCTDecode/Name/X/Metadata 4314 0 R/BitsPerComponent 8/ColorSpace 5247 0 R/Width 290/Height 372/Type/XObject>>stream
It's entirely possible that your particular PDF has only one "Image" and then the rest of it is "Streams".
I suggest cracking it open to take a look. It would also be beneficial if you included some sample code with on the library you're using.
I also found by opening a PDF in a text editor this string /Type /Page which seems to create new pages, so you there's a chance you could count those to determine which page you're currently on.
The header at the top of the document I'm reviewing is %PDF-1.2 and the latest version is 1.7, so there may be some disparity here because of that.
Any chance you can share the PDF file you're working with?
For example, I recorded a video using my camera and saved it as my_vacation.mp4 which size is 50MB. I opened the video file and an encrypted file called secret_message.dat using Visual Studio, by using File.ReadAllBytes() in C#, concatenated both arrays of bytes, and then saved it as my_vacation_2.mp4.
The program I created for testing purpose is able to save the byte index where the hidden file begin and I want to use it as key to extract that hidden file later.
Now I can play that video file normally, without any error. Total file size is 65MB. Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
What might be the flaw of this technique? Is this also a valid steganography technique?
Is this a valid steganography technique?
Yes, it is. The definition of steganography is hiding information in another medium without someone suspecting its presence or existence. Just because it may be a bad approach doesn't change its intentions at all. If anything, a multitude of papers on steganography mention this technique in their introduction section as an example of how steganography can be applied.
What might be the flaw of this technique?
There are mainly 2 flaws: it is trivial to detect and is absolutely fragile to modification attacks.
Many formats encode their data either by a header which says in advance how many bytes to read before the end of file, or by putting an end-of-file marker, which means to keep on reading data until the marker is encountered. By attaching your data after that, you ensure they won't be read by the appropriate format decoder. This can fool your 11-year old cousin who knows nothing about that sort of stuff, but anyone mildly experienced can load the file and count how many bytes were read. If there are unaccounted bytes in the physical file, that will instantly raise red flags.
Even worse, it's trivial to fully extract your secret. You may argue it's encrypted, but remember, the aim of steganography is to not raise any suspicion. Most steganalysis approaches put a statistical number to it, e.g., 60% there is a message hidden in X medium. A few others can go a bit further and guess the approximate length of the embedded secret. In comparison, you're already caught red-handed.
Talking about length, a file of X bitrate/compression and Y duration approximately results to a file of size Z. Even an unsavvy one will know what's up when the size is 30% larger than expected.
Now, imagine your file is communicated through an insecure channel where a warden inspects its contents and if he suspects foul play, he can modify the file so that the recipient doesn't get the message. In this case, it's as simple as loading the file and resaving it. In fact, your method is so fragile it can be destroyed by even the most unintentional of attacks. By just uploading your track to a site for playback, it can unwittingly reencode it for higher compression, just because it makes sense.
Suppose no one could access the original file, of course no one would know that the last 15MB part of that video file is actually another file, right?
No. Your secret file is encrypted, so that probably rules out any headers showing up in hex editor, but there is a problem - MP4 container format and its structure is well known.
You can extract all video/audio tracks and what you are left with is some metadata and your secret message, so it will be obvious that it's not supposed to be there.
It is a valid technique, just not a very effective one.
I need to read line by line from text file (log files from server) and they are big (about 150-200MB). I am using StreamReader and its great for "little" files like 12MB but not for so big. After sometime it is loaded and it shows in my DataGridView but its huge in memory. I am using bindingSource.Filter on this DataGridView (like textbox and when user write letter it is filtering one column a comparing strings, not showing rows without letters in textbox and so) and with big files its useless too. So I want to ask you what is best solution for me.
I was looking and find some solutions but I need help with decided whats best for me and with implementing (or if there is something else):
Load data in background and showing them in realtime. I am not really sure how to do that and I don´t know what to do with filtering in this solution.
Maybe upgrade somehow streamreader? Or write own method for reading lines from file with binary readers?
I found something about Memory-Mapped in c# 4.0 but i can´t use 4.0. Could this help feature help?
Thanks for help
Okay, so I am implementing Paging and I read 5k lines of text file than after clicking button next lines and so. I am using BaseStream.Position for saving a starting reading but I would like to use some other function which save number of lines and mainly I want use method for starting reading from exact line but I can´t find nothing for StreamReader. Is there something like that?
Load data in background and showing them in realtime. I am not really sure how to do that and I don't know what to do with filtering in this solution.
This is no help. It will still consume much memory in the background thread.
Maybe upgrade somehow streamreader? Or write own method for reading lines from file with binary readers?
Still no help, once you read the whole file into memory it will, well, consume memory.
I think you get the point. Don't load the whole file into memory. Load only chunks of it. Use paging. You cannot show 200MB worth of data on a single screen anyways, so only load the portion you need to show on the screen. So basically you need to implement the following function:
public IEnumerable<string> ReadFile(int page, int linesPerPage, out totalLines)
{
...
}
The Skip and Take extension methods could be helpful here.
I am wondering the capabilities of the SWF Format. I have some text in a Flash Video, which is an email address (xyz#somewhere.com) which I would like to write a C# application to edit. I have the SWF File Format Spec and was wondering if the following is possible:
If I read in the Tag (I am assuming this text is stored as a Static Text Tag, correct me if I am wrong). Once I found the correct tag for my text I then change the value in the tag and save the SWF file. Do you see any problems with this approach?
Chris
Yes, almost certainly there is more to this.
First, the swf is most likely compressed and will need decompressing.
Locating the value you need to change should not be too hard, but you will need to change at least one (probably several) field lengths to cater for the change. This in turn will probably require you to parse the rest of the SWF in order to recalculate the lengths of the various blocks that will be altered.
Not for the faint-hearted.
I am currently working on a project and my goal is to locate text in an image. OCR'ing the text is not my intention as of yet. I want to basically obtain the bounds of text within an image. I am using the AForge.Net imaging component for manipulation. Any assistance in some sense or another?
Update 2/5/09:
I've since went along another route in my project. However I did attempt to obtain text using MODI (Microsoft Office Document Imaging). It allows you to OCR an image and pull text from it with some ease.
This is an active area of research. There are literally oodles of academic papers on the subject. It's going to be difficult to give you assistance especially w/o more deatails. Are you looking for specific types of text? Fonts? English-only? Are you familiar with the academic literature?
"Text detection" is a standard problem in any OCR (optical character recognition) system and consequently there are lots of bits of code on the interwebs that deal with it.
I could start listing piles of links from google but I suggest you just do a search for "text detection" and start reading :). There is ample example code available as well.
recognizing text inside an image is indeed a hot topic for researchers in that field, but only begun to grow out of control when captcha's became the "norm" in terms of defense against spam bots. Why use captcha's as protection? well because it is/was very hard to locate (and read) text inside an image!
The reason why I mention captcha's is because the most advancement* is made within that tiny area, and I think that your solution could be best found there.
especially because captcha's are indeed about locating text (or something that resembles text) inside a cluttered image and afterwards trying to read the letters correctly.
so if you can find yourself a good open source captcha breaking tool you probably have all you need to continue your quest...
You could probably even throw away the most dificult code that handles the character recognition itself, because those OCR's are used to read distorted text, something you don't have to do.
*: advancement in terms of visible, usable, and practical information for a "non-researcher"
If you're ok with using an online API for this, the API at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml can do text detection in addition to just OCR.
Stroke width transform can do that for you. That's at least what MS developed for their mobile phone OS. A discussion on the implementation is here at https://stackoverflow.com/