How to reduce the Recaptcha difficulty?

How to reduce the Recaptcha difficulty? - c#

I use RecaptchaControl, and the users complain that the image is not that clear (the black part in the the captcha is too hard to read). Is there any property to make the image less difficult (less noisy)?
<recaptcha:RecaptchaControl ID="recaptcha" runat="server" PublicKey="XXX" PrivateKey="YYY" OverrideSecureMode="true" />

To put it simply, no.
As others have said you can only customise the UI.

I'm the author of the article about the recaptcha API tutorial #web-development-blog.com and must say it's often very hard to read the text created images by recaptcha. While using the API I got sometimes better results while using a different language for the challenges.
In one of my projects I got a lot of complains and switched to a different script:
http://code.google.com/p/cool-php-captcha/
The challenges are less hard to solve and the script is very easy to use.

Possibly not what you are looking for but there are lots of custom captcha controls out there i.e.
CaptchaNET_2.aspx
With a bit of knowledge of the Graphics classes in C# it isn't a hard job to tweak the code to make the captcha simplier (or harder). We use one for a mobile phone web site and control is important otherwise it is unreadable in lower res phones

In some reCaptchas (I'm guessing this is somehow configurable), you only have to get one word right (I'm not sure how close the second word must be).
I've tested in the reCaptcha in my own website and it worked (two words wrong = fail, one word right + one wrong = success); tested in stackoverflow's reCaptcha and it didn't... I had to get both words right. That's why I think it's configurable.
I found this post along with the one below (where I read that only one word had to be right) exactly because I too am finding it too hard to get through reCaptcha's; too often having to refresh more than 5 times to get a readable pair. Other post:
Are reCAPTCHA CAPTCHAs getting harder or is just me

Google now allows you to change the difficulty setting.

Related

Connecting To A Website To Look Up A Word(Compiling Mass Data/Webcrawler)

I am currently developing a Word-Completion application in C# and after getting the UI up and running, keyboard hooks set, and other things of that nature, I came to the realization that I need a WordList. The only issue is, I cant seem to find one with the appropriate information. I also don't want to spend an entire week formatting and gathering a WordList by hand.
The information I want is something like "TheWord, The definition, verb/etc."
So, it hit me. Why not download a basic word list with nothing but words(Already did this; there are about 109,523 words), write a program that iterates through every word, connects to the internet, retrieves the data(definition etc) from some arbitrary site, and creates XML data from said information. It could be 100% automated, and I would only have to wait for maybe an hour depending on my internet connection speed.
This however, brought me to a few questions.
How should I connect to a site to look up these words? << This my actual question.
How would I read this information from the website?
Would I piss off my ISP or the website for that matter?
Is this a really bad idea? Lol.
How do you guys think I should go about this?
EDIT
Someone noticed that Dictionary.com uses the word as a suffix in the url. This will make it easy to iterate through the word file. I also see that the webpage is stored in XHTML(Or maybe just HTML). Here is the source for the Word "Cat". http://pastebin.com/hjZj6AC1

For what you marked as your actual question - you just need to download the data from the website and find what you need.
A great tool for this is CsQuery which allows you to use jquery selectors.
You could do something like this:
var dom = CQ.CreateFromUrl("http://www.jquery.com");
string definition = dom.Select(".definitionDiv").Text();

Locating heading text using CSS

I have some C# code which will verify the heading text on a web page, currently located via xpath as follows.
Assert.AreEqual("Permissions", driver.FindElement(By.XPath(".//*[#id='navigation']/li[6]/h3")).Text);
This, as I understand, will check the text found at the end of the XPath matches the word "Permissions".
The above currently works but I would rather use CSS locators. I hear its best not to use XPath if possible.
I'm new to website testing so am not yet familiar with all this, any help will be much appreciated.
Let me know if there is more you require than what is provided above or if you have any alternate suggestions to the method already used.

I may not fully understand what you are trying to do but since this has no answer after 7 hours i though i might at least mention that if you are trying to get the innerhtml of the header you can use Agile Html http://htmlagilitypack.codeplex.com/ its very easy to use. Might not be what you are looking for though.

Why use window.location in a hyperlink?

I was going through a website I've taken over and came across this section in one of the pages:
<a href="javascript:window.location='<%=GetSignOutUrl()%>';">
// img
</a>
Apparently anyone who has ever used the site without javascript would not be able to log out properly (surprisingly enough, this has never come up).
So the first thing that comes to mind is
<a href="<%=GetSignOutUrl()" onclick="javascript:window.location='<%=GetSignOutUrl()%>';">
// img
</a>
Then I realized I don't know why I'm keeping the javascript call around at all. I'm just a little confused as to why it would have been written like that in the first place when a regular link would have worked just fine. What benefit does window.location have over just a regular link?
This is also the only place in the website I've seen something like this done (so far).
Edit: The programmer before me was highly competent, which is actually why I was wondering if there was something I wasn't taking into account or if he just made a simple oversight.

There are three possibilites:
The developer was trying to enforce Javascript use before sending the user along.
The developer was trying to mask the href in the link. Perhaps this was so it wouldn't be crawled effectively, or the status bar had something to do with it.
The developer was a non-conformist.
I would remove it and see if it breaks. But then again, I'm a conformist.

My guess is that if the developer didn't know to consider the client's capability of executing javascript, they might not have known what a href is. It's unlikely but not impossible.

It could be because multiple domains possibly are used and which one was unclear or not easily available in the code?

This might be an attempt to hide the link from search engines.

Reading Character from Image

I am working on an application which requires matching of numbers from a scanned image file to database entry and update the database with the match result.
Say I have image- employee1.jpg. This image will have two two handwritten entries - Employee number and the amount to be paid to the employee. I have to read the employee number from the image and query the database for the that number, update the employee with the amount to be paid as got from the image. Both the employee number and amount to be paid are written inside two boxes at a specified place on the image.
Is there any way to automate this. Basically I want a solution in .net using c#. I know this can be done using artificial neural networks.
Any ideas would be much appreciated.

You can use Microsoft Office Document Imaging Library (MODI), which is contained in the Office 2003/2007.
Links:
OCR with Microsoft® Office - Code
Project - example of using MODI
Microsoft Office Document Imaging -
Wikipedia - contains a simple
example in VB.NET

Pattern recognition is a basic example when neural networks are studied. I don't know if is any library/framework to work with AI in C#. If you find one, first you have to do is to train the network (supervised learning) and for this you need to prepare a big sample set of images; more examples -> result more accurate. In the other hand you can use OpenCV (C/C++, Python and Java) that is a library specialized in computer vision and has a module to implement AI methods.
Have a nice day!
Oscar.

I think this is very hard to automate. The problem is just because you need some kind of very good OCR software. And even if you got this, what if it reads something wrong, cause of the frouzy handwriting of someone? If the ID is wrong the paid is booked to the wrong employee and if the amount is wrong he got the wrong salary!
Both are things you won't really happen. Just to show you how hard a good ocr to find is, just take a look out there on how a captcha works. The principle is nothing more than an image of an hard to read text.
So my opinion would be, that you can't really automate this process. At least you can write a program to assist a human by entering the values manually (also take a look at Amazon Mechanical Turk):
Show on the right the picture with the handwritten values, or if they are always on the same position or specially marked (with a box around them, etc.), try to find these places automatically and show them to the user.
On the left offer two textboxes, where the user can enter the values.
To get this to a fast and fluent process, you have to take great care about how the user can enter easily the values by just using the keyboard:
When showing a new picture, set the focus to the id textbox
If the user id is always a specific length, switch to the next box if all numbers are entered
(If you allow this, a backspace in the empty next box should focus back to the previous one)
Otherwise allow a change to the next textbox by hitting tab or return
Normally these textboxes are arranged above each other (not side by side), thus you should support switching between them using the up down arrow keys.
After finishing the entry in the last textbox automatically show the next image.
Also in this case of a fresh new entry (nothing already entered) allow a easy switch back to the old entry by using the backspace or left arrow key)
By using such a process a single person can enter many entries into your database and the costs are much cheaper than finding wrong entries in your database afterwards.
Just a last suggestion:
Cause this is a boring process for a human which can easily leads to errors, maybe let two people enter these values and only if both are entering the same than take this value as approved. This should lead to a correct rate somewhere above 99%. If you need absolutely 100% think about letting 4-5 people checking one entry and only if all of them enter the same values take it as approved. To get also a comparsion about how good your ocr software would be, just let it also run over your images and compare this results to the human entered values to get an idea, when you can really rely on your ocr only.

OCR engines are not trained to read handwritten text, so you might have trouble with MODI. You want to try to find an ICR engine. Even so, the best ones of these are only 80% accurate on good inputs. You might get better because you know that your text is always numbers.
This SO question/answer says that OCROpus has ICR
FOSS Intelligent Character Recognition (ICR)

There is LeadTools SDK for OCR/ICR. This is very handy in recognising the handwritten characters. I am doing a feasibilty study with this, and till now I think it will work out. leadTools have provided components which can be used in your application, it supports C, C++, C#, VB.Net etc.
You can visit the following link for this:
http://www.leadtools.com/downloads/default.htm?category=

How to allow simple HTML tags in comments or anywhere?

In my web application I am developing a comment functionality, where user's can comment. But I am facing a problem which is I want to allow simple HTML tags in the comment box. HTML tags like <b>, <strong>, <i>, <em>, <u>, etc., that are normally allowed to enter in a commenting box. But then I also want when user presses enter then it will be automatically converted into breaks (<br /> tags) and get stored into database, so that when I'll display them in the web page then they'll look like as user entered.
Can you please tell me how to parse that user entered only allowed set of HTML tags and how to convert enters into <br /> tags and then store them in database.
Or if anyone have some better idea or suggestion to implement this kind of functionality. I am using ASP.NET 2.0 (C#)
I noticed that StackOverflow.com is doing the same thing on Profile Editing. When we edit our profile then below the "About Me" field "basic HTML allowed" line is written, I want to do almost the same functionality.

I don't have a C# specific answer for you, but you can go about it a few different ways. One is to let the user input whatever they want, then you run a filter over it to strip out the "bad" html. There are numerous open source filters that do this for PHP, Python, etc. In general, it's a pretty difficult problem, and it's best to let some well developed 3rd party code do this rather than write it yourself.
Another way to handle it is to allow the user to enter comments in some kind of simpler markup language like BBCode, Textile, or Markdown (stackoverflow is using Markdown), perhaps in conjunction with a nice Javascript editor. You then run the user's text through a processor for one of these markup languages to get the HTML. You can usually obtain implementations of these processors for whatever language you are using. These processors usually strip out the "bad" HTML.

Its rather "simple" to do that in php and python due to the large number of functions.I am still learning c# .lol. but havent yet come across the function.The chances are that it exists and all you need to do is search for it.I mean a function that can take the user input,search for the allowed tags (which are in an array of course) and replace the <> with something else like [] then use a function to escape the other html tags.In php we use htmlentities().
Something like
<code>
$txt=$_POST['comment'];
$txt=strreplace("<b>*</b>","[b]*[/b],"$txt");
$securetxt=htmlentities($txt);
$finaltxt=strreplace("[b]*[/b]","<b>*</b>","$securetxt");
//Now save to Db

I'm not sure, but I think you have to escape html characters when inserting in database and when retrieving echo them unescaped, so the browser can see it just like html.

I don´t know asp.net, but in php there´s an easy function, strip_tags, that let you add exceptions (in your case, b, em, etc.). If there´s nothing like that in C# you can write a regular expression that strips out all tags except the allowed ones but chances are that such an expression already exists so it should be easy to find.
replacing \n (or something similar) with br shouldn´t be a problem either with a simple search and replace.

This is a dangerous road to go down. You might think you can do some awesome regexes, or find someone who can help you with it, but sanitizing SOME markup and leaving other is just crazy talk.
I highly recommend you look into BBCode or another token system. Even something untokenized such as what SO uses, is probably a much better solution.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.