I am using SearchIndex="All" in the Amazon Product API and getting no results. When I specify the category, I do get results.
Does anyone know if there are any restrictions on this search index or ?
Thanks
Since there's no code snippet to look at, I may be off-base here, but make sure you are using Operation=ItemSearch in your request.
If you have the Developer's Guide PDF downloaded, there's a lot of great information starting on page 253 which includes restrictions and necessary inclusions and examples.
Cheers
There are certain limitations due to the large number of items listed at Amazon. So, what they do is force you to use a "SearchIndex". It's not a very good name, but it means the department similar to those listed on the Amazon homepage. These departments include Books, Electronics, etc.
Here is excerpt from page 103 of the API Dev Guide version 2010-11-01. Be sure to use the same version of the Dev Guide as your API call because the functionality changes between versions. You can download the Dev Guide:
http://www.onlineinvestingai.com/publicFiles/Amazon-Product-Advertising-API-Dev-Guide-2010-11-01.pdf .
I uploaded it to the above link because it is nearly impossible to find on the Amazon Dev site.
Searching Across Indices
ItemSearch requests require that you specify a search index. This is because searching across the millions of products in Amazon databases would take too long. Product Advertising API does, however, enable you to search across multiple search indices using the All or Blended search indices.
All Search Index
You can use the All
search index to do an ItemSearch search through all search indices. There are, however, a number of restrictions placed on this request:
the only parameter that you can use in the request is Keywords, and you cannot, for example, sort results.
Note: You cannot use the All search index in an ItemLookup request.
The Amazon Product Advertising API is actually fairly easy to use. The hard part is finding the documentation on the Amazon site.
Hope that helps. The document is long and difficult to understand at first, but after you try different searches and see the results it works.
Here are two more documents (for the same version of the API) that may be helpful:
Getting Started Guide:
http://www.onlineinvestingai.com/publicFiles/Amazon-Product-Advertising-API-Getting-Started-Guide-2010-11-01.pdf
Quick Reference Card:
http://www.onlineinvestingai.com/publicFiles/Amazon-Product-Advertising-API-Quick-Reference-Card-2010-11-01.pdf
look my friend all you need to do in searching (All) is using just keyword don't assign any other parameters in the request and you will have results but only 50 result because amazon will inforce you to identify category .
This is an old question, but working with the Product Advertising API today, I have found nothing but dead ends and frustration trying to find answers. Hoping this will help a lot of folks that get pass the signing and need to start searching.
A lot of the c# examples that are listed out there use the following:
ItemSearchRequest request = new ItemSearchRequest();
request.SearchIndex = "Books";
request.Title = "WCF";
request.ResponseGroup = new string[] { "Small" };
The problem is the that the example is using "Title" to search on and I am not getting any results with this either. Use "Keywords" and you will see results come back with the SearchIndex set to "All"
ItemSearchRequest request = new ItemSearchRequest();
request.SearchIndex = "All";
request.Keywords = "WCF";
request.ResponseGroup = new string[] { "Small" };
This should resolve your issue.
Related
We are using Azure Search to find courses from a list. We search on three fields. We need fuzzy searches on the Coursename and Keywords, but want only to include exact matches for the course code (which has sequential numeric codes like "RB046").
Using the Search Explorer, you can do something like this with the URL:
https://xxx.search.windows.net/indexes/prospectussearchindexlive/docs?api-version=2016-09-01&search=CourseCode:"HCN_6_006" OR Coursename:"HCN_6_006~1" OR Keywords:"HCN_6_006~1"
But in the API it seems you can only have one search term applied to all specified columns. Does anyone know of a way you can do this with the API without performing two searches?
So as pointed out in the comments by Bruce Johnston, largely the feature set (especially with respect to search query syntax) should be identical between the REST API and the Azure search .Net SDK. The search explorer on the Azure portal, is literally a call into the REST API, so there shouldn't be any differences there.
The following search API call might translate to what you are looking for (I have included the POST version, you should be able to use GET as well if you'd like).
POST /indexes/prospectussearchindexlive/docs/search?api-version=2016-09-01
{
"search": "CourseCode:HCN_6_006 OR Coursename:HCN_6_006~1 OR Keywords:HCN_6_006~1",
"queryType": "full",
"searchMode": "all"
}
You should take a look at the Lucene syntax for Azure search, which is here: https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search that will help you write different search queries.
You can also refer to the SDK documentation here: https://learn.microsoft.com/en-us/azure/search/search-howto-dotnet-sdk which talks about how to use the .NET SDK to perform search queries. Look at the Documents.Search method for more details.
So I'm looking to get a list of all checked out documents based on aspects, specifically cm:checkedOut as mentioned here.
Basically, I want to search for all documents with the aspect cm:checkedOut and assume that that would be the list of all checked out documents.
I've been able to use this in the node browser, but I'm having a hard time finding a REST endpoint that will let me search for a certain aspect.The only thing I found useful was this CMIS endpoint:
Executes a CMIS query statement against the contents of the Repository.
GET /alfresco/service/cmis/query?q={q}&includeAllowableActions={includeAllowableActions?}&includeRelationships={includeRelationships?}&renditionFilter={renditionFilter?}&searchAllVersions={searchAllVersions?}&skipCount={skipCount?}&maxItems={maxItems?}
And I'm assuming I'd have to write a query something like this. But I'm new to Alfresco and I honestly don't know if I can write a CMIS query to search for a particular aspect?
So my question is: is there a REST endpoint that will let me search for a specific aspect and do what I want to find? If it's relevant, I'm using a .NET framework with C#.
Download the Apache CMIS workbench, configure the workbench to use the CMIS 1.0 specification cause the dotCMIS implementation only supports 1.0
And your query is very simple, just use: SELECT * FROM cm:checkedOut
Generally speaking, you can always browse all web scripts and see if there's anything for you there that can do the job.
http://localhost:8080/alfresco/service/index/uri/
Depending on your version of Alfresco, you can use the new Swagger based API explorer, example here.
https://api-explorer.alfresco.com/api-explorer/
If you look at what Share uses (it means that it's OOTB available) for it's advanced search, you get this.
http://localhost:8080/alfresco/service/index/uri/slingshot/node/search
It has a bunch of parameters you need to send (test this by searching through Share and using Firebug) but the main one is the "query" one, which is basically a JSON of properties you search with.
{"prop_cm_name":"45445656","prop_cm_title":"","prop_cm_description":"","prop_mimetype":"","prop_cm_modified-date-range"
:"","prop_cm_modifier":"","datatype":"cm:content"}
I am working on my mapper and I need to get the full map of newegg.com
I could try to scrap NE directly (which kind of violates NE's policies), but they have many products that are not available via direct NE search, but only via google.com search; and I need those links too.
Here is the search string that returns 16mil of results:
https://www.google.com/search?as_q=&as_epq=.com%2FProduct%2FProduct.aspx%3FItem%3D&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=newegg.com&as_occt=url&safe=off&tbs=&as_filetype=&as_rights=
I want my scraper to go over all results and log hyperlinks to all these results.
I can scrap all the links from google search results, but google has limit of 100 pages for each query- 1,000 results and again, google is not happy with this approach. :)
I am new to this; Could you advise / point me in the right direction ? Are there any tools/methodology that could help me to achieve my goals?
I am new to this; Could you advise / point me in the right direction ?
Are there any tools/methodology that could help me to achieve my
goals?
Google takes a lot of steps to prevent you from crawling their pages and I'm not talking about merely asking you to abide by their robots.txt. I don't agree with their ethics, nor their T&C, not even the "simplified" version that they pushed out (but that's a separate issue).
If you want to be seen, then you have to let google crawl your page; however, if you want to crawl Google then you have to jump through some major hoops! Namely, you have to get a bunch of proxies so you can get past the rate limiting and the 302s + captcha pages that they post up any time they get suspicious about your "activity."
Despite being thoroughly aggravated about Google's T&C, I would NOT recommend that you violate it! However, if you absolutely need to get the data, then you can get a big list of proxies, load them in a queue and pull a proxy from the queue each time you want to get a page. If the proxy works, then put it back in the queue; otherwise, discard the proxy. Maybe even give a counter for each failed proxy and discard it if it exceeds some number of failures.
I've not tried it but you can use googles custom search API. Of course, its starts to cost money after 100 searches a day. I guess they must be running a business ;p
It might be a bit late but I think it is worth to mention that you can professionally scrape Google reliable and not cause problems with it.
Actually it is not of any threat I know about to scrape Google.
It is cahllenging if you are unexperienced but I am not aware about a single case of legal consequence and I am always following this topic.
Maybe one of the largest cases of scraping happened some years ago when Microsoft scraped Google to power Bing. Google was able to proof it by placing fake results which do not exist in real world and Bing suddenly took them up.
Google named and shamed them, that's all that happened as far as I remember.
Using the API is rarely ever a real use, it costs a lot of money to use it for even a small amount of results and the free amount is rather small (40 lookups per hour before ban).
The other downside is that the API does not mirror the real search results, in your case maybe less a problem but in most cases people want to get the real ranking positions.
Now if you do not accept Googles TOS or ignore it (they did not care about your TOS when they scraped you in their startup) you can go another route.
Mimic a real user and get the data directly from the SERPs.
The clue here is to send around 10 requests per hour (can be increased to 20) with each IP address (yes you use more than one IP). That amount has proven to cause no problem with Google over the past years.
Use caching, databases, ip rotation management to avoid hitting it more often than required.
The IP addresses need to be clean, unshared and if possible without abusive history.
The originally suggested proxy-list would complicate the topic a lot as you receive unstable, unreliable IPs with questionable absuive use, share and history.
There is an open source PHP project on http://scraping.compunect.com which contains all the features you need to start, I used it for my work which now runs for some years without troubles.
Thats a finished project which is mainly built to be used as customizable base of your project but runs standalone too.
Also PHP is not a bad choice, I originally was sceptical but I was running PHP (5) as background process for two years without a single interruption.
The performance is easily good enough for such a project so I would give it a shot.
Otherwise, PHP code is like C/JAVA .. you can see how things are done and repeat them in your own project.
I've been trying to write something similar to this but without any success so I was wondering if there is any google API or any other "function" which would allow me to do the following
List<string> GetTop20Links (string keyword)
{
//code to download and return top 20 results (links) in List<string> format
}
I prefer to use Google API for .NET.
As far as I can determine from google's blogs, there once was (and maybe is still operational) a SOAP webservice that let you query structured search results. But you need a so called API key for the query and they don't give them out any more. The successor to this service was claimed to be the AJAX search API, but I cannot find any current reference to it.
On the google API page there is a custom search service, but you have to give a specific set of websites that the search includes and you either need to show the ads along with the results or pay some fee for the usage.
You see, it's not in google's interest to let somebody easily query their search engine and then just use the results for whatever purpose. They are making money through the ads, that's their business model.
So if you want to realize that function you would have to turn to html-scraping which is ugly at best, tends to break often and is difficult to get right.
BTW: You can do that quite easily with bing. There is a link to the Bing Search API here and a code sample here
Does anyone know of a "similar words or keywords" algorithm available in open source or via an API? I am looking for something sort of like a thesaurus but smarter.
So for example:
intel
returns:
processor,
i7 core chip,
quad core chip,
.. etc
Any ideas or even something to point me in the right direction in C#?
Edit:
I would love to hear your thoughts, but why cant we just use the Google Adwords API to generate keywords relevant to those entered?
Why not send a search query out to Google and parse what it returns?
Also, check out Google Sets.
There is no algorithm for such a thing. You are going to have to acquire data for a Thesaurus, and load it into a data structure then it is a simple dictionary lookup (you can use the C# Dictionary class for that). Maybe you can look at Wordnet, or Moby Thesaurus as a source for data. Other options are using a Thesaurus server and getting the information online as needed.
You will need a large database containing this information. The rest is simple - look up the input and see what releated words are stored.
The hard part is generating the database. Doing it manually might take years if you want to cover a large number of words and topics.
Generating it is surly non-trivial. Maybe you could try to download web pages and analyze words frequently appearing together, but I assume this will still take months to build, tune, and finally gather good quality data. Maybe extracting links from Wikipedia might be a good source of information because of its semi-structure.
I've made the open office thesaurus functions available for .NET in the NHunspell project. You can use the OO Thesaurus files.
Here is the NHunspell Project