Importing Data from Dictionary Web site - c#

I need to translate a group of words from a free online dictionary, so I wrote a simple program in C# to send http requests, and then parse returned HTML to extract the meanings.
However, the free web site stops after 130 requests, asking for manual entering of words as in image (captcha ) in order to continue. how can i over come this problem?
Thanks,
Samer

This isnt a problem with your code, it is their website stopping itself from being spammed with hits from a single user. Easiest thing to do would be to have a dictionary of your own, then there would be no Captcha to get around.

Related

Is there a way to capture a view and save it?

We have many element(s) in a ContentPage. The goal is to take a picture of a specific element and then have access to that data - to save it or possibly other things such as cropping it.
So this question is twofold - is there a way to photographically capture a given element? Is there a way to do this if the element is not fully in view? Example a ScrollView would potentially have some of its elements not currently in view.
Our attempt at this is to use device specific screenshots and crop them to a given element. The screenshots are working, but we aren't having luck with cropping. Not to mention in the case as described above the screenshot will not work as the view isn't fully visible.
Is there a way to obtain the "graphical" (photo) data of an element at a given time even if it's not currently visible/partially visible?
Thanks for reading in advance.
After a lot of talking, this is what I understand
The Users of your application are the Workers of Your company
The application is for managing the accounts of your companies Customers
The Customers have no access to their data, in any shape or form
Part of the Customer Data is their Email Adress
You want to send a copy of their Data to the Customers
As Emails do not allow formating that well, you want to send that Data as a Screenshot of the UI.
If I got all that right:
You are neck-deep in a XY Problem. Or rather a ((XY)Y)Y Problem - a XY problem of the 3rd Generation.
The obvious solution would be to fix point 3 and give your customers access to their Data already:
You can do that via a extra Programm, App, a Webpage or anything similar. If they can receive emails, they can download a app or open a Webpage and see their data there. May need a login, but nothing special. There are even ways to encode data/direct links into Emails and register your Programm with a custom Format. Indeed, that is how Steam Links on the Desktop work.
Meanwhile the In-House user get a "Customer Management" Programm that allows more direct access to the Customers Data in the Database (I asume you got a backend Database. But it is at least possible you do not).
If you can not fix Option 3 for stupid Boss/Legal Reasons (these are the only Valid reasons I can Imagine. And I can not stress enoug how stupid the boss would have to be in that), you should at least be able to fix at Point 5/6:
The first Option would be to send Text Emails. People often underestimate jsut how much is possible with pure Text. It is basically like writing on a Console, but even that is enough medium to make a Art in it.
The other ways involve Managing the HTML limitations:
Save HTML Mail
The main security issue with HTML mails, is "downloading external content" part. Those operations can not be reliable scanned by Virus scanners and the like - especially in the age of HTTPS. Unless we talk about Kaspersky and the stupid Idea they had.
And even if they can be scanned reliable, even just the request of those files can be used for spam senders to verify the Email Adress is still in use. So it is a no-go too.
So you will need to Inline as much as possible. Inlining images is not that possible. While HTML totally has a Standart for that - you Base64 encode the binary into the HTML - this does not work reliably. At least Microsoft Outlook is known to interpret all Base64 images in the Email as Attachments - even the inlined ones. And even if they fixed this or it is no longer a relevant issue, inlining images tends to increase the HTML size significantly.
You can use CSS to some degree. But aside from inlining it, you might have to go back a step or two. In the end, Email Programms are really weak web-browsers. So they do not nessesarily support all the latest stuff instantly. Anything below CSS 3.0 should reliably work by now. But you better ask someone once you got more specific Requirements for this Email.
PDF Attachment
Somewhat more established is to create a .PDF file and send it. All those bills/other stuff in .PDF format you get - those have been created on demand from a Database, by the same code that also send the Email. In many cases the demand was automated too or the Sending Programm was a outright Background Process.
.PDF allows all Formating you could want. It can take up images inline. And there is plenty of ways to create .PDF from code. And as you can send it as a attachment, the Virus scanner has time to go over it. And we are not in the last Millenium, where a PDF Reader was a uncommon programm to have installed (I still remember the times when a current Version of Acrobat PDf Reader was delivered on every CD with a .PDF Format Handbook).
If you are stil dead serious about the whole "make a Image of the UI to send that", my only question is: How many Years have been aloted for that?

Connecting To A Website To Look Up A Word(Compiling Mass Data/Webcrawler)

I am currently developing a Word-Completion application in C# and after getting the UI up and running, keyboard hooks set, and other things of that nature, I came to the realization that I need a WordList. The only issue is, I cant seem to find one with the appropriate information. I also don't want to spend an entire week formatting and gathering a WordList by hand.
The information I want is something like "TheWord, The definition, verb/etc."
So, it hit me. Why not download a basic word list with nothing but words(Already did this; there are about 109,523 words), write a program that iterates through every word, connects to the internet, retrieves the data(definition etc) from some arbitrary site, and creates XML data from said information. It could be 100% automated, and I would only have to wait for maybe an hour depending on my internet connection speed.
This however, brought me to a few questions.
How should I connect to a site to look up these words? << This my actual question.
How would I read this information from the website?
Would I piss off my ISP or the website for that matter?
Is this a really bad idea? Lol.
How do you guys think I should go about this?
EDIT
Someone noticed that Dictionary.com uses the word as a suffix in the url. This will make it easy to iterate through the word file. I also see that the webpage is stored in XHTML(Or maybe just HTML). Here is the source for the Word "Cat". http://pastebin.com/hjZj6AC1
For what you marked as your actual question - you just need to download the data from the website and find what you need.
A great tool for this is CsQuery which allows you to use jquery selectors.
You could do something like this:
var dom = CQ.CreateFromUrl("http://www.jquery.com");
string definition = dom.Select(".definitionDiv").Text();

Calling c# function on string value BEFORE postback?

I am new to ASP, and have jumped right in and started a new MVC 4 project.
I am using the standard template and am trying to edit the login page. The problem I am trying to solve is this:
If you open Fiddler and login you can see the user name and password in plain text. What I would like to do would be to use a C# function I have created in a helpers file BEFORE the post is submitted, for example on a button click event, is this possible?
If so can someone point me in the direction of a tutorial/ example please as this has baffled me for a few days now!
Thanks again for your help
Don't reinvent the wheel. Use https instead so that data does not travel as plain text.
You can't run a C# function before the postback, how would you accomplish that? C# code runs server-side, but you post the form from the client-side. You can't apply a C# method on something you haven't shown it yet.
You have basically two options:
1.) use javascript to somehow alter the data before sending it to the server
2.) use SSL to protect the channel
The problem with the first option is, that ANYONE who sees the form can see your javascript code as well. In other words, no matter how strong protection you come up with, the attacker sees the algorithm, so he can decode the data very easily... Probably the most reliable option is the second one - SSL. It isn't 100%, but at least it's much harder to penetrate...
If you want to encrypt the data before the form is submitted you can only rely on client side code - javascript. This is in no way the optimal solution, as already pointed out by others, you should use https.

Parsing Data from a website in WP7

This website that keeps updating some live information about the bus timings in Helsinki.
I want to parse the live information from the website and display it on my WP7 phone. The user needs to enter the bus stop number and the WP7 app should show the buses/trams currently in the bus stop.
Is there any way I could obtain the real time information from the website?
If you look at the source of the website (http://www.omatlahdot.fi/omatlahdot/web?command=fullscreen&stop=1020455) -- in IE right-click on the page and select View Source -- you'll see that there's really very little in the actual source file, in particular none of the data is there. All of the hard work is coming from the referenced javascript file scripts/fullscreen_header.js (full path is http://www.omatlahdot.fi/omatlahdot/scripts/fullscreen_header.js). You want to download that .js file and study how it retrieves data with AJAX calls. Start with the reloadPage function.
You can make these same calls (e.g., using WebClient) to retrieve the data into your application. If you want to extract the data from the returned HTML, I'd consider parsing it simply as a string since I am assuming that it would have a very regular structure and dragging in a general-purpose HTML parser would probably be overkill.
Alternatively, you might find out if the omatlahodot.fi provides the data as JSON or XML feeds, so you don't have to "screen-scrape" the HTML. I don't read Finnish, so I can't help you with that. Look around on their websites (maybe a section called "dev" or "api") or send them an email inquiry.
Please let us know how it works out!

Using C# to retrieve data from a Google search

Here's what I want the program to do:
Read a text file (the text file contains random search criteria like "sunflower seeds", "chrome water faucets", etc) to retrieve a search phrase.
Submit the search phrase to Google and retrieve the first four URLs.
Retrieve the Google Page Rank of each of the returned URLs.
Being a neophyte C# programmer, I can handle #1 easily. Unfortunately, I've never dealt with using the Google APIs before. I do have a Google API key and I'm aware that there is a search limit using the API. At most, I'll probably use this on a dozen search phrases (or "keywords") per day. I can do this manually, but I know there has to be a way to do this with a C# program. I've read that this can be done using AJAX, but I don't know AJAX and I'd rather this just be an executable program on my PC rather than a web-based app. A push in the right direction from someone would be a big help. Also, I really don't want this to be a "screen-scraper", either. Isn't there a way that I can get the info (URLs and Page Rank) from Google without having to scrape a returned HTML search page?
I don't want anyone to write the code for me, just need to know if it's possible and a push towards finding the information on how to accomplish it.
Thanks in advance everyone!
I don't want anyone to write the code
for me, just need to know if it's
possible and a push towards finding
the information on how to accomplish
it.
Look into the WebClient class
http://msdn.microsoft.com/en-us/library/system.net.webclient(VS.80).aspx
Try this:
googleSearch = #"http://" + #"www.google.com/#hl=en&q="+#query;
where query is the string of your search.

Categories

Resources