get top 20 google results c# api - c#

I've been trying to write something similar to this but without any success so I was wondering if there is any google API or any other "function" which would allow me to do the following
List<string> GetTop20Links (string keyword)
{
//code to download and return top 20 results (links) in List<string> format
}

I prefer to use Google API for .NET.

As far as I can determine from google's blogs, there once was (and maybe is still operational) a SOAP webservice that let you query structured search results. But you need a so called API key for the query and they don't give them out any more. The successor to this service was claimed to be the AJAX search API, but I cannot find any current reference to it.
On the google API page there is a custom search service, but you have to give a specific set of websites that the search includes and you either need to show the ads along with the results or pay some fee for the usage.
You see, it's not in google's interest to let somebody easily query their search engine and then just use the results for whatever purpose. They are making money through the ads, that's their business model.
So if you want to realize that function you would have to turn to html-scraping which is ugly at best, tends to break often and is difficult to get right.
BTW: You can do that quite easily with bing. There is a link to the Bing Search API here and a code sample here

Related

Use different search terms for different columns

We are using Azure Search to find courses from a list. We search on three fields. We need fuzzy searches on the Coursename and Keywords, but want only to include exact matches for the course code (which has sequential numeric codes like "RB046").
Using the Search Explorer, you can do something like this with the URL:
https://xxx.search.windows.net/indexes/prospectussearchindexlive/docs?api-version=2016-09-01&search=CourseCode:"HCN_6_006" OR Coursename:"HCN_6_006~1" OR Keywords:"HCN_6_006~1"
But in the API it seems you can only have one search term applied to all specified columns. Does anyone know of a way you can do this with the API without performing two searches?
So as pointed out in the comments by Bruce Johnston, largely the feature set (especially with respect to search query syntax) should be identical between the REST API and the Azure search .Net SDK. The search explorer on the Azure portal, is literally a call into the REST API, so there shouldn't be any differences there.
The following search API call might translate to what you are looking for (I have included the POST version, you should be able to use GET as well if you'd like).
POST /indexes/prospectussearchindexlive/docs/search?api-version=2016-09-01
{
"search": "CourseCode:HCN_6_006 OR Coursename:HCN_6_006~1 OR Keywords:HCN_6_006~1",
"queryType": "full",
"searchMode": "all"
}
You should take a look at the Lucene syntax for Azure search, which is here: https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search that will help you write different search queries.
You can also refer to the SDK documentation here: https://learn.microsoft.com/en-us/azure/search/search-howto-dotnet-sdk which talks about how to use the .NET SDK to perform search queries. Look at the Documents.Search method for more details.

How to search for an aspect/property?

So I'm looking to get a list of all checked out documents based on aspects, specifically cm:checkedOut as mentioned here.
Basically, I want to search for all documents with the aspect cm:checkedOut and assume that that would be the list of all checked out documents.
I've been able to use this in the node browser, but I'm having a hard time finding a REST endpoint that will let me search for a certain aspect.The only thing I found useful was this CMIS endpoint:
Executes a CMIS query statement against the contents of the Repository.
GET /alfresco/service/cmis/query?q={q}&includeAllowableActions={includeAllowableActions?}&includeRelationships={includeRelationships?}&renditionFilter={renditionFilter?}&searchAllVersions={searchAllVersions?}&skipCount={skipCount?}&maxItems={maxItems?}
And I'm assuming I'd have to write a query something like this. But I'm new to Alfresco and I honestly don't know if I can write a CMIS query to search for a particular aspect?
So my question is: is there a REST endpoint that will let me search for a specific aspect and do what I want to find? If it's relevant, I'm using a .NET framework with C#.
Download the Apache CMIS workbench, configure the workbench to use the CMIS 1.0 specification cause the dotCMIS implementation only supports 1.0
And your query is very simple, just use: SELECT * FROM cm:checkedOut
Generally speaking, you can always browse all web scripts and see if there's anything for you there that can do the job.
http://localhost:8080/alfresco/service/index/uri/
Depending on your version of Alfresco, you can use the new Swagger based API explorer, example here.
https://api-explorer.alfresco.com/api-explorer/
If you look at what Share uses (it means that it's OOTB available) for it's advanced search, you get this.
http://localhost:8080/alfresco/service/index/uri/slingshot/node/search
It has a bunch of parameters you need to send (test this by searching through Share and using Firebug) but the main one is the "query" one, which is basically a JSON of properties you search with.
{"prop_cm_name":"45445656","prop_cm_title":"","prop_cm_description":"","prop_mimetype":"","prop_cm_modified-date-range"
:"","prop_cm_modifier":"","datatype":"cm:content"}

Google Search returns 503 error for complicated search queries

When I try to download a Google Search result page using HttpWebRequest in C#, everything works very well if I use simple search terms, like
http://www.google.com/search?q=stackoverflow
But when I try to make it more complex, for example
http://www.google.com/search?q=inurl%3A%22goethe%22%20filetype%3Apdf
which means
inurl:"goethe" filetype:pdf
, I will receive a 503 error because Google thinks I'm a bot. Is there any workaround?
Edit: UserAgent is set to "Mozilla/5.0".
well.. if your search is done programmatically, then Google just so happens to be right.. you ARE a bot :-)
cheers!
I don't believe it has much to do with how complex your query happens to be. The only thing that really matters is if they think that you're a bot. If you're submitting queries at a very high rate, then Google will think you're a bot so there are several possible solutions:
Reduce the rate at which you're sending queries.
Use proxies to make multiple queries.
Additionally, it's important to note that if you make web requests without saving cookies, then that might be another "signal" for Google to think that you're a bot. You should also be very careful not to get the proxies blocked by Google, because you're scraping the big G. It's hard to find free proxies and if you abuse them, then they'll get shut down so be a good citizen!
Good luck!
Try Google Custom Search APIs and Tools. This will allow you to retrieve search results without fear of being denied access (up to a limit).
Alternatively, mimic all nuances of a typical search query. For example, in my browser, searching for inurl:"goethe" filetype:pdf results in this URL being requested.
Then there are cookies and other http headers. Make it look a lot more like a browser is requesting it.

What is the easiest way to programmatically extract structured data from a bunch of web pages?

What is the easiest way to programmatically extract structured data from a bunch of web pages?
I am currently using an Adobe AIR program I have written to follow the links on one page and grab a section of data off of the subsequent pages. This actually works fine, and for programmers I think this(or other languages) provides a reasonable approach, to be written on a case by case basis. Maybe there is a specific language or library that allows a programmer to do this very quickly, and if so I would be interested in knowing what they are.
Also do any tools exist which would allow a non-programmer, like a customer support rep or someone in charge of data acquisition, to extract structured data from web pages without the need to do a bunch of copy and paste?
If you do a search on Stackoverflow for WWW::Mechanize & pQuery you will see many examples using these Perl CPAN modules.
However because you have mentioned "non-programmer" then perhaps Web::Scraper CPAN module maybe more appropriate? Its more DSL like and so perhaps easier for "non-programmer" to pick up.
Here is an example from the documentation for retrieving tweets from Twitter:
use URI;
use Web::Scraper;
my $tweets = scraper {
process "li.status", "tweets[]" => scraper {
process ".entry-content", body => 'TEXT';
process ".entry-date", when => 'TEXT';
process 'a[rel="bookmark"]', link => '#href';
};
};
my $res = $tweets->scrape( URI->new("http://twitter.com/miyagawa") );
for my $tweet (#{$res->{tweets}}) {
print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n";
}
I found YQL to be very powerful and useful for this sort of thing. You can select any web page from the internet and it will make it valid and then allow you to use XPATH to query sections of it. You can output it as XML or JSON ready for loading into another script/ application.
I wrote up my first experiment with it here:
http://www.kelvinluck.com/2009/02/data-scraping-with-yql-and-jquery/
Since then YQL has become more powerful with the addition of the EXECUTE keyword which allows you to write your own logic in javascript and run this on Yahoo!s servers before returning the data to you.
A more detailed writeup of YQL is here.
You could create a datatable for YQL to get at the basics of the information you are trying to grab and then the person in charge of data acquisition could write very simple queries (in a DSL which is prettymuch english) against that table. It would be easier for them than "proper programming" at least...
There is Sprog, which lets you graphically build processes out of parts (Get URL -> Process HTML Table -> Write File), and you can put Perl code in any stage of the process, or write your own parts for non-programmer use. It looks a bit abandoned, but still works well.
I use a combination of Ruby with hpricot and watir gets the job done very efficiently
If you don't mind it taking over your computer, and you happen to need javasript support, WatiN is a pretty damn good browsing tool. Written in C#, it has been very reliable for me in the past, providing a nice browser-independent wrapper for running through and getting text from pages.
Are commercial tools viable answers? If so check out http://screen-scraper.com/ it is super easy to setup and use to scrape websites. They have free version which is actually fairly complete. And no, I am not affiliated with the company :)

'SearchIndex="All"' not working in Amazon Product API

I am using SearchIndex="All" in the Amazon Product API and getting no results. When I specify the category, I do get results.
Does anyone know if there are any restrictions on this search index or ?
Thanks
Since there's no code snippet to look at, I may be off-base here, but make sure you are using Operation=ItemSearch in your request.
If you have the Developer's Guide PDF downloaded, there's a lot of great information starting on page 253 which includes restrictions and necessary inclusions and examples.
Cheers
There are certain limitations due to the large number of items listed at Amazon. So, what they do is force you to use a "SearchIndex". It's not a very good name, but it means the department similar to those listed on the Amazon homepage. These departments include Books, Electronics, etc.
Here is excerpt from page 103 of the API Dev Guide version 2010-11-01. Be sure to use the same version of the Dev Guide as your API call because the functionality changes between versions. You can download the Dev Guide:
http://www.onlineinvestingai.com/publicFiles/Amazon-Product-Advertising-API-Dev-Guide-2010-11-01.pdf .
I uploaded it to the above link because it is nearly impossible to find on the Amazon Dev site.
Searching Across Indices
ItemSearch requests require that you specify a search index. This is because searching across the millions of products in Amazon databases would take too long. Product Advertising API does, however, enable you to search across multiple search indices using the All or Blended search indices.
All Search Index
You can use the All
search index to do an ItemSearch search through all search indices. There are, however, a number of restrictions placed on this request:
the only parameter that you can use in the request is Keywords, and you cannot, for example, sort results.
Note: You cannot use the All search index in an ItemLookup request.
The Amazon Product Advertising API is actually fairly easy to use. The hard part is finding the documentation on the Amazon site.
Hope that helps. The document is long and difficult to understand at first, but after you try different searches and see the results it works.
Here are two more documents (for the same version of the API) that may be helpful:
Getting Started Guide:
http://www.onlineinvestingai.com/publicFiles/Amazon-Product-Advertising-API-Getting-Started-Guide-2010-11-01.pdf
Quick Reference Card:
http://www.onlineinvestingai.com/publicFiles/Amazon-Product-Advertising-API-Quick-Reference-Card-2010-11-01.pdf
look my friend all you need to do in searching (All) is using just keyword don't assign any other parameters in the request and you will have results but only 50 result because amazon will inforce you to identify category .
This is an old question, but working with the Product Advertising API today, I have found nothing but dead ends and frustration trying to find answers. Hoping this will help a lot of folks that get pass the signing and need to start searching.
A lot of the c# examples that are listed out there use the following:
ItemSearchRequest request = new ItemSearchRequest();
request.SearchIndex = "Books";
request.Title = "WCF";
request.ResponseGroup = new string[] { "Small" };
The problem is the that the example is using "Title" to search on and I am not getting any results with this either. Use "Keywords" and you will see results come back with the SearchIndex set to "All"
ItemSearchRequest request = new ItemSearchRequest();
request.SearchIndex = "All";
request.Keywords = "WCF";
request.ResponseGroup = new string[] { "Small" };
This should resolve your issue.

Categories

Resources