Decode Opera cache content - c#

I found in http://www.nirsoft.net several browsers' cache viewers. My question relates to only Opera that I wish to learn what apis, functions or methods were used to decode tmp content (opera cache files) into URLs ? I am thankful if you could help, please explain something..

This seems to work pretty nicely for me:
strings .opera/cache/dcache4.url | egrep -o '(https?|ftp)://.*$'
Returns 1944 urls on separate lines for me. If you look at the output of strings you'll find that it looks pretty easy to find out which .tmp file under .opera/cache is related to which url too.
--
strings is a UNIX utility in binutils; the source code is pretty simple and can be found here among other places
egrep just matches a regular expression against the strings; System.Text.RegularExpressions.Regex.Match would do exactly the sam from C#

Opera publicly documents the file format used by the cache file, which should help. (If it doesn't, say so (and why!), and I can push to get the documentation improved!)

Related

C# Custom OCR that returns a formatted string

Just for my personal interest, I see from my research that it's not that easy to start your own OCR. However, I would like to hear ideas on how to achieve the challenge of not just recognising characters, but also giving back the results in the formatted string.
For example, I have an image of a table (imagine that it's an image with "|" and "_" being drawn straight lines):
|Number, AnotherNumber|Some Text|
|1,4 |Blah |
And after using a silent OCR, I get the result as "|Number, AnotherNumber|SomeText|\n|1,4|Blah|"
Any ideas of how could I achieve this, and what available tools/libraries I could make use of? I also would like to write this in C# with Visual Studio 2010. And ideally to work with PDFs but different image formats are fine. I've already looked at some, but they seem non-compatible as they use C++ or C.
Thank you.
Alina.
getting ocr libaries is quite hard (of course just if you dont pant to pay for it)
you could try this one, its not free but if you have office 2007:
http://www.codeproject.com/Articles/41709/How-To-Use-Office-2007-OCR-Using-C

Text files to test the functionality of a search engine

In the purpose of practicing for an upcoming programming contest, I'm making a very basic search engine in C# that takes a query from the user (e.g. "Markov Decision Process") and searches through a couple of files to find the most relevant one to the query.
The application seems to be working (I used a term-document matrix algorithm).
But now I'd like to test the functionality of the search engine to see if it really is working properly. I tried to take a couple of Wikipedia articles and saving them as .txt files and testing it out, but I just can't see if it's working fast enough (even with some timers).
My question is, is there a website that shows a couple of files to test a search engine on (along with the logically expected result)?
I'm testing with common sense so far, but it would be great to be sure of my results.
Also, how can I get a collection of .txt files (maybe 10 000+ files) about various subjects to see if my application runs fast enough?
I tried copying a few Wikipedia articles, but it would take way too much time to do. I also thought about making a script of some sort to do it for me, but I really don't know how to do that.
So, where can I find a lot of files with separated subjects?
Otherwise, how can I benchmark my application?
Note: I guess a simple big .txt file where each line represents a "file" about a subject would do the job too.
One source of text files would be Project Gutenberg. They supply CD/DVD images if you want to download thousands of files at once. (The page doesn't state it, but I would imagine they are in txt format inside the CD/DVD iso.)
You can get wikipedia pages by using a recursive function and loading the html from every page linked to by one set page.
if you have some experience with c# this should help you:
http://www.csharp-station.com/HowTo/HttpWebFetch.aspx
then loop through the text and collect all the instances of the text: "<a href=\""
and recursively call that method. You should also use a counter to limit the number of recursions.
Also, to prevent OutOfMemory exceptions you should stop the method when it reaches multiples of some number of iterations and write everything to a file. Then flush the old data from a string
You can use the datasets from GroupLens Research's site.
Some samples: movies, books

signing firefox extension

guys I need some help for days now I have been looking for a way for signing firefox XPI file,
but i didn't found any thing that works ( including here ) the posts I found where very old,
and not compadiable with new firefox version.
does any body here know how to?
thanks in advance.
p.s
I want to write a packer\signer in c#
edit:
im using mcCoy CA that MDN says valid.
i know there is a python script that sign add ons but i dont know pyton so please advice something else and for that matter i preferably dont watnt to use java...
If you are asking for a code example in Java, there is XPISigner. However, its source code seems rather complicated, you might have better chances if you look at the signature format description and the simple Python example script. It is mostly simple, the "complicated" part is only generating a detached RSA signature of the META-INF/zigbert.sf file (stored in META-INF/zigbert.rsa). Note that META-INF/zigbert.rsa has to be the first file in the XPI archive.

get all strings in c# code file

We have been asked to provide all of the possible error messages in our code for support purposes.
Unfortunately they are not all located in resource files so I figure that if we can get a list of all of the strings in the app we can then filter out the error messages from there.
Is there anything that would let me do this in a C# app?
Cheers
How about using the find function like such
You can also use regular expressions if you have pattern and be detailed in your search by changing the "Use" to "Regular Expressions"
If you have ReSharper 5 you can use their localization feature to help you do this.
Enable localization for your project, then right click the project and select Find Code Issues. It will list all instances of a string hardcoded into the application. (Unless you have Localizable(false) set)
If you can think of a consistent string that you use on each message line (eg "throw new exception(" or "MessageBox.Show(", it may be as simple as hitting Ctrl+Shift+F in Visual Studio (find in files), typing it in, then copying the results to a file.
Before you jump into Regex land, check this out: Regex to parse C# source code to find all strings
I'm sure there are some RegEx expressions or some such you could run on your code base and maybe catch all strings. Seeing as how this is a business requirement and you're likely to be repeating this in the future, I'd refactor to get all my error messages in a structured format first. Then, automate the analysis of the structured format.
Resource files might be appropriate.

How to detect if a JS is packed already

Hey guys.. I am writing a Windows application in C# that minifies CSS files and packs JS files as a batch job. One hurdle for the application is, what if the user selects a JavaScript file that has already been packed? It will end up increasing the file size, defeating my purpose entirely!
Is opening the file and looking for the string eval(function(p,a,c,k,e,d) enough? My guess is no, as there are other JS packing methods out there. Help me out!
One might suggest that you compare the size of the pre and post packed JS and return/use the smaller of the two.
UPDATE based on question in comment by GPX on Sep 30 at 1:02
The following is a very simple way to tell. There may be different, or more accurate, ways of determining this, but this should get you going in the right direction:
var unpackedJs = File.ReadAllText(...)
var unpackedSize = jsContent.Length;
var packedJs = ... // Your Packaging routine
File.WriteAllText(pathToFile, unpackedSize < packedJs.Length ? unpackedJs : packedJs)
I would check file size and lines of code (e.g.: average line length). These two information should be enough to know if the code is sufficiently compact.
Try this demo.
I direct you to a post that suggests packing is bad.
http://ejohn.org/blog/library-loading-speed/
Rather use minification. Google Closure compiler can do this via a REST web service. Only use a .min.js extension for minified (not packed).
Gzip will do a better job and will be uncompressed by the browser. Its best to switch on zip compression on the server which will zip a minified file down further.
Of course this raises the question 'How can I tell if my Javascript is already minified!'
When you create/save a minified file, use the standard file name convention of "Filename.min.js". Then when they select the file, you can check for that as a reliable indicator.
I do not think it is wise to go overboard on the dummy-proofing. If a user (who is a developer, at that), is dumb enough to double-pack a file, they should experience problems. I know you should give them the benefit of the doubt, but in this case it does not seem worth the overhead.
If you're using a safe minimization routine, your output should be the same as the input. I would not recommend the routine you mention. MS's Ajax Minifier is a good tool and even provides dll's to use in your project. This would make your concern a non-issue.
I would suggest adding a '.min' prefix to the extension of the packed file, something like 'script.min.js'. Then just check the file name.
Other than that, I would suggest checking how long the lines are, and how many spaces are used. Minified/packed JS typically has almost no spaces (typically in strings) and very long lines.

Categories

Resources