I am wondering the capabilities of the SWF Format. I have some text in a Flash Video, which is an email address (xyz#somewhere.com) which I would like to write a C# application to edit. I have the SWF File Format Spec and was wondering if the following is possible:
If I read in the Tag (I am assuming this text is stored as a Static Text Tag, correct me if I am wrong). Once I found the correct tag for my text I then change the value in the tag and save the SWF file. Do you see any problems with this approach?
Chris
Yes, almost certainly there is more to this.
First, the swf is most likely compressed and will need decompressing.
Locating the value you need to change should not be too hard, but you will need to change at least one (probably several) field lengths to cater for the change. This in turn will probably require you to parse the rest of the SWF in order to recalculate the lengths of the various blocks that will be altered.
Not for the faint-hearted.
Related
Good morning everbody,
I would like to know if there is a possibility to add an invisible text block to a text file. It could be a header or at the bottom of the file, it does not matter.
It should work like this, I open a text file I add my normal text, and before closing the file I add the secret text.
When I open it again I should be able to detect if there is any secret block that is written.
Is that doable? Is there any library that does this? I cannot find anything close to this (I work with C#)
Thanks,
Eric
No, this is not possible. If the data is saved in a file, it can be read. Hiding of data is impossible. It is either present (and thus readable), or not.
You only option would be to encrypt your data as mikelegg has commented above.
Currently, I can extract all the text chunks with their location data from a PDF. The problem is that the PDF contains images with text annotations which I do not want including in the extraction.
However, for whatever reason whenever I search the PDF for images, it only finds 1 of the images and usually throws the exception: The colour space is not supported. It's as if it doesn't recognise them as images?
I am not wishing to extract the images, just locate where they start and end in relation to the PDF so I can exempt the text that is on top of the images.
For example:
Where the numbers on the graph are unwanted and need to be removed from the extracted text.
Im just not sure how to:
A) Locate all the images and store the coordinates of where it starts and ends
B) Ignore the text that is on top of the images in the PDF document
(I am using iTextSharp to try and achieve this, but so far I am not having much luck)
I'm not exactly sure how iTextSharp works but the PostScript language reference or the PDF Reference manuals may be a good place to start figuring out what you need to know.
I just cracked open a PDF file in a text editor to check out the format because I haven't seen it in a while and then realized what the problem might be.
PDFs support "Images", and "Stream Objects" which can contain image data. Stream objects actually declare enough information that you can know where they begin and end and write something to manually ignore them.
A Stream Object Header looks like this:
<</Intent/RelativeColorimetric/Subtype/Image/Length 19678/Filter/DCTDecode/Name/X/Metadata 4314 0 R/BitsPerComponent 8/ColorSpace 5247 0 R/Width 290/Height 372/Type/XObject>>stream
It's entirely possible that your particular PDF has only one "Image" and then the rest of it is "Streams".
I suggest cracking it open to take a look. It would also be beneficial if you included some sample code with on the library you're using.
I also found by opening a PDF in a text editor this string /Type /Page which seems to create new pages, so you there's a chance you could count those to determine which page you're currently on.
The header at the top of the document I'm reviewing is %PDF-1.2 and the latest version is 1.7, so there may be some disparity here because of that.
Any chance you can share the PDF file you're working with?
I need a help in developing a Windows Appl using C#.NET VS2010. The functionality is very simple, the user will input a text file and my program is supposed to extract the relevant data from the text file and output it to either csv or text or whatever.
My biggest problem whenever I deal with text files is the format. Even though if you open the input text file in a Notepad or Wordpad it looks perfect, the layout etc. But once we start programming it I realize that what I am seeing is not the way the data is stored inside the file. I read many articles on Unicode/UTF etc.. etc.. but I dont have a definite solution to know exactly what my file format is. So the end result is that I end up getting many exceptions.
In Unix Shell Scripting it used to be simple. There is some good Unix command like less which is similar to more but it also display any formatting characters inside the file. Also there are some useful commands like unix2dos and dos2unix.
Nevertheless, is there some program/code or professional method which can find the exact file formatting of my input file and then reformat it to "plain text" so that the data extraction becomes easy and bug-free.
Thanks
I have already seen solution for hiding text files or messages within Image or audio files..
but i want solution for hiding text file within another text files (.txt, .doc, .pdf).
can somebody help for this??
Steganography is based on slightly changing data to "hide" some other set of data within these changes. That's why an image with steganography is slightly different than the original. You can't notice if if you don't know it's there, but the fact is you saved the data as changes within color information of pixels.
.txt file is nothing else than a big hunk of characters. If you tried to somehow change the data to hide something in it, it would result in unreadable text. If you change the color of a pixel from 215 Red to 217 Red, you won't really notice. But changing A to F or Ł is quite noticable.
So no, I don't believe it can be done. At least not with .txt files.
While I agree with #stonehead that at the end of the day if you put something in the file someone can find it, but there are a few tricks out their that may prove to be viable options.
Since most users are not living in their command prompt the most straight forward approach is to misrepresent the file to the GUI. This is a pretty handy trick for this.
http://www.howtogeek.com/howto/windows-vista/stupid-geek-tricks-hide-data-in-a-secret-text-file-compartment/
If you are storing data in a pdf you should have very little problems. I would use PdfClown. Not to get too into it but you will want to read up about the structure of a pdf. With clown pdf you could store an asset inside with no connection to the presentation layer. Given the complexity of pdf files i will almost bet no one will be looking in, i would base64encode the chunk to have it blend in with images and other data it would be difficult for someone to find it by just opening up the file.
Be For Warned ClownPDF C# library is not for the faint of heart and it will help to have some java experience because a lot of their docs are for java.
Hope these options help.
I have a PDF and want to extract the text contained in it. I've tried a few different PDF libraries and they all return basically the same results. When extracting the text from a two page document with literally hundreds of words, only a dozen or so words from the header are returned.
Is there any way to tell if the text I'm after is actually text or a raster image of the text? I'm thinking something along the lines of Firebug's "Inspect Element" but at this point I'll take any solution that tells what I'm really looking at.
This project really doesn't justify attempting to use OCR. And, although a simple solution, using fields in the PDF is not an option since the generator of the file is a third party.
If Acrobat/Reader can select the text, then it Is Text.
Reasons your library might not be able to find the text in question:
Complex/bad fonts or encodings. Adobe can be very forgiving of garbage in, somehow managing to get Good Info out.
The text could be in an annotation rather than the page contents. It won't matter what program parses the content stream if you need to look in the annot array instead.
You didn't name a particular library, so it's possible that the library you're using doesn't look inside XObject Forms. That's unlikely in an even remotely mature API, but stranger things have happened.
If you can get away with copy/pasta from Reader, then just go that route.
Have you tried Amyuni PDF Creator .Net? It allows you to enumerate all components from a specified rectangular region of a page and inspect their type from a predefined types list. You could run a quick test using the trial version and the following code sample for text extraction:
// open a PDF file
axPDFCreactiveX1.Open(System.IO.Directory.GetCurrentDirectory()+"\\sampleBookmarks.pdf", "");
axPDFCreactiveX1.Refresh ();
String text = axPDFCreactiveX1.GetRawPageText (1);
MessageBox.Show (text);
Additionally, it provides Tesseract OCR integration in case you needed it.
Disclaimer: I am part of the development team of this product.
Check this site out. It may contain some helpful code snippets. http://www.codeproject.com/KB/cs/PDFToText.aspx