How to search for an aspect/property? - c#

So I'm looking to get a list of all checked out documents based on aspects, specifically cm:checkedOut as mentioned here.
Basically, I want to search for all documents with the aspect cm:checkedOut and assume that that would be the list of all checked out documents.
I've been able to use this in the node browser, but I'm having a hard time finding a REST endpoint that will let me search for a certain aspect.The only thing I found useful was this CMIS endpoint:
Executes a CMIS query statement against the contents of the Repository.
GET /alfresco/service/cmis/query?q={q}&includeAllowableActions={includeAllowableActions?}&includeRelationships={includeRelationships?}&renditionFilter={renditionFilter?}&searchAllVersions={searchAllVersions?}&skipCount={skipCount?}&maxItems={maxItems?}
And I'm assuming I'd have to write a query something like this. But I'm new to Alfresco and I honestly don't know if I can write a CMIS query to search for a particular aspect?
So my question is: is there a REST endpoint that will let me search for a specific aspect and do what I want to find? If it's relevant, I'm using a .NET framework with C#.

Download the Apache CMIS workbench, configure the workbench to use the CMIS 1.0 specification cause the dotCMIS implementation only supports 1.0
And your query is very simple, just use: SELECT * FROM cm:checkedOut

Generally speaking, you can always browse all web scripts and see if there's anything for you there that can do the job.
http://localhost:8080/alfresco/service/index/uri/
Depending on your version of Alfresco, you can use the new Swagger based API explorer, example here.
https://api-explorer.alfresco.com/api-explorer/
If you look at what Share uses (it means that it's OOTB available) for it's advanced search, you get this.
http://localhost:8080/alfresco/service/index/uri/slingshot/node/search
It has a bunch of parameters you need to send (test this by searching through Share and using Firebug) but the main one is the "query" one, which is basically a JSON of properties you search with.
{"prop_cm_name":"45445656","prop_cm_title":"","prop_cm_description":"","prop_mimetype":"","prop_cm_modified-date-range"
:"","prop_cm_modifier":"","datatype":"cm:content"}

Related

Use different search terms for different columns

We are using Azure Search to find courses from a list. We search on three fields. We need fuzzy searches on the Coursename and Keywords, but want only to include exact matches for the course code (which has sequential numeric codes like "RB046").
Using the Search Explorer, you can do something like this with the URL:
https://xxx.search.windows.net/indexes/prospectussearchindexlive/docs?api-version=2016-09-01&search=CourseCode:"HCN_6_006" OR Coursename:"HCN_6_006~1" OR Keywords:"HCN_6_006~1"
But in the API it seems you can only have one search term applied to all specified columns. Does anyone know of a way you can do this with the API without performing two searches?
So as pointed out in the comments by Bruce Johnston, largely the feature set (especially with respect to search query syntax) should be identical between the REST API and the Azure search .Net SDK. The search explorer on the Azure portal, is literally a call into the REST API, so there shouldn't be any differences there.
The following search API call might translate to what you are looking for (I have included the POST version, you should be able to use GET as well if you'd like).
POST /indexes/prospectussearchindexlive/docs/search?api-version=2016-09-01
{
"search": "CourseCode:HCN_6_006 OR Coursename:HCN_6_006~1 OR Keywords:HCN_6_006~1",
"queryType": "full",
"searchMode": "all"
}
You should take a look at the Lucene syntax for Azure search, which is here: https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search that will help you write different search queries.
You can also refer to the SDK documentation here: https://learn.microsoft.com/en-us/azure/search/search-howto-dotnet-sdk which talks about how to use the .NET SDK to perform search queries. Look at the Documents.Search method for more details.

Is there a Linq to REST library for C#

I have an Linq Expression and I want to convert it to a querystring for REST, i.e.
public IQueryable<Organisation> Organisations;
...
var organisations = Organisations.Where(x => x.Name == "Bob");
becomes
http://restservice.com/Organisations?$filter=Name eq "Bob"
I did find one eventually in Linq2Rest (also a NuGet) which seems to fit the bill. Doesn't support OAuth but would be possible to build this in.
If you are control over the datasource, it's OData what you are looking for.
A google-searched brought HttpEntityClient up, although I don't have any experience with it but it looks useful.
I guess you could also write your own implementation, because frankly, rest-apis don't have to follow a certain standard when it comes to filtering, ordering etc...
PocoHttp can do exactly what you want. Moreover, it can do the call to the service and deserialize entities for you.
You can also easily modify its ODataProvider to support additional OData native functions (length, startswith, etc.)
Early pre-release versions of the OData Library had a query string parser, but expression building was never fully implemented, and the feature was then dropped. It's major hole in the library, since without it, you are left with payload and some header support only.
Fortunately Linq2Rest does exactly what you need, with one line of code:
var organisations = Organisations.sources.Filter(Request.Params).OfType<Organisations>()
The cast is necessary because a query string can select against the collection, producing a different collection of types. If you are only predicating on properties, then you don't care about that.
I found that DataServiceContext developed by Microsoft works much smoother than Linq2Rest and HttpEntityClient third party libraries mentioned here.
Documentation is also much better. The downside is that DataServiceContext works with XML only (no JSON).
But both WebAPI OData REST services and WCF Data Services can return XML, if it requested by a client in HTTP header. Because XML support takes no additional development work, lack of JSON support is unlikely to be an issue.
There are LINQ to REST examples using DataServiceContext: http://msdn.microsoft.com/en-us/library/windowsazure/dd894039.aspx
try Odata
The Open Data Protocol (OData) is a Web protocol for querying and updating data that provides a way to unlock your data and free it from silos that exist in applications today. OData does this by applying and building upon Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores. The protocol emerged from experiences implementing AtomPub clients and servers in a variety of products over the past several years. OData is being used to expose and access information from a variety of sources including, but not limited to, relational databases, file systems, content management systems and traditional Web sites.
EDIT 1:
take a look at here also:
http://paulhammant.com/2012/02/13/client-side-mvc-frameworks-compared/

get top 20 google results c# api

I've been trying to write something similar to this but without any success so I was wondering if there is any google API or any other "function" which would allow me to do the following
List<string> GetTop20Links (string keyword)
{
//code to download and return top 20 results (links) in List<string> format
}
I prefer to use Google API for .NET.
As far as I can determine from google's blogs, there once was (and maybe is still operational) a SOAP webservice that let you query structured search results. But you need a so called API key for the query and they don't give them out any more. The successor to this service was claimed to be the AJAX search API, but I cannot find any current reference to it.
On the google API page there is a custom search service, but you have to give a specific set of websites that the search includes and you either need to show the ads along with the results or pay some fee for the usage.
You see, it's not in google's interest to let somebody easily query their search engine and then just use the results for whatever purpose. They are making money through the ads, that's their business model.
So if you want to realize that function you would have to turn to html-scraping which is ugly at best, tends to break often and is difficult to get right.
BTW: You can do that quite easily with bing. There is a link to the Bing Search API here and a code sample here

What is the easiest way to programmatically extract structured data from a bunch of web pages?

What is the easiest way to programmatically extract structured data from a bunch of web pages?
I am currently using an Adobe AIR program I have written to follow the links on one page and grab a section of data off of the subsequent pages. This actually works fine, and for programmers I think this(or other languages) provides a reasonable approach, to be written on a case by case basis. Maybe there is a specific language or library that allows a programmer to do this very quickly, and if so I would be interested in knowing what they are.
Also do any tools exist which would allow a non-programmer, like a customer support rep or someone in charge of data acquisition, to extract structured data from web pages without the need to do a bunch of copy and paste?
If you do a search on Stackoverflow for WWW::Mechanize & pQuery you will see many examples using these Perl CPAN modules.
However because you have mentioned "non-programmer" then perhaps Web::Scraper CPAN module maybe more appropriate? Its more DSL like and so perhaps easier for "non-programmer" to pick up.
Here is an example from the documentation for retrieving tweets from Twitter:
use URI;
use Web::Scraper;
my $tweets = scraper {
process "li.status", "tweets[]" => scraper {
process ".entry-content", body => 'TEXT';
process ".entry-date", when => 'TEXT';
process 'a[rel="bookmark"]', link => '#href';
};
};
my $res = $tweets->scrape( URI->new("http://twitter.com/miyagawa") );
for my $tweet (#{$res->{tweets}}) {
print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n";
}
I found YQL to be very powerful and useful for this sort of thing. You can select any web page from the internet and it will make it valid and then allow you to use XPATH to query sections of it. You can output it as XML or JSON ready for loading into another script/ application.
I wrote up my first experiment with it here:
http://www.kelvinluck.com/2009/02/data-scraping-with-yql-and-jquery/
Since then YQL has become more powerful with the addition of the EXECUTE keyword which allows you to write your own logic in javascript and run this on Yahoo!s servers before returning the data to you.
A more detailed writeup of YQL is here.
You could create a datatable for YQL to get at the basics of the information you are trying to grab and then the person in charge of data acquisition could write very simple queries (in a DSL which is prettymuch english) against that table. It would be easier for them than "proper programming" at least...
There is Sprog, which lets you graphically build processes out of parts (Get URL -> Process HTML Table -> Write File), and you can put Perl code in any stage of the process, or write your own parts for non-programmer use. It looks a bit abandoned, but still works well.
I use a combination of Ruby with hpricot and watir gets the job done very efficiently
If you don't mind it taking over your computer, and you happen to need javasript support, WatiN is a pretty damn good browsing tool. Written in C#, it has been very reliable for me in the past, providing a nice browser-independent wrapper for running through and getting text from pages.
Are commercial tools viable answers? If so check out http://screen-scraper.com/ it is super easy to setup and use to scrape websites. They have free version which is actually fairly complete. And no, I am not affiliated with the company :)

Best way to build a search function

I have a website that has over 400,000 items. Some similar, some vastly different. We want to provide a way to search these items the best way possible. After being delivered the website it was using full text indexing. The solution is basic at best, woefully inadequate at worst.
So what is the best way to search these items? They are stored in a SQL Server Database (2005). Our website is designed in C# 2.0.
Currently here is the process:
User enters value into text box.
We 'clean' this entry. Removing 'scary' characters that could be an attempted hack. Remove key words (and, or, etc..)
Pass value into a stored procedure to return results.
Return results.
Look at Lucene.NET. I think it's a vast improvement over full-text search in SQL Server.
SQL Server Central has a nice article on creating a Google-like Full Text Search using SQL Server. Unfortunately you have to register view the full article, but registration is free and they post a lot of good information. Here is the link:
http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/
Excerpt:
...
Google Style
The key to a successful application is
to make it easy to use but powerful.
Google has done this with their Web
search engine. The syntax for queries
is simple and intuitive, but
full-featured. Though the basic
building blocks of a Google query are
simple you can combine them in
powerful ways. I'll begin with basic
Google query syntax and add some
additional operators to take advantage
of the power of SQL Server CONTAINS
predicate syntax. The full Google
syntax is defined in the Google
Help:Cheat Sheet at
http://www.google.com/help/cheatsheet.html.
...
The article has full example code and even a link to download it. Its an interesting read even if you don't plan on implementing it.
You can have a look at Lucene.net, it will minimize the calls to the database for the search queries.
Following from http://incubator.apache.org/lucene.net/
Lucene.Net is a source code,
class-per-class, API-per-API and
algorithmatic port of the Java Lucene
search engine to the C# and .NET
platform utilizing Microsoft .NET
Framework.
Lucene.Net sticks to the APIs and
classes used in the original Java
implementation of Lucene. The API
names as well as class names are
preserved with the intention of giving
Lucene.Net the look and feel of the C#
language and the .NET Framework. For
example, the method Hits.length() in
the Java implementation now reads
Hits.Length() in the C# port.
In addition to the APIs and classes
port to C#, the algorithm of Java
Lucene is ported to C# Lucene. This
means an index created with Java
Lucene is back-and-forth compatible
with the C# Lucene; both at reading,
writing and updating. In fact a Lucene
index can be concurrently searched and
updated using Java Lucene and C#
Lucene processes.
You could use Google site search to deliver your search results. Doesn't always give you the flexibility to display the results as you want, but for many is good enough.
Second step is quite controversial - what words you consider as 'scary'? If you use SQL server build-in full text search then instead of manually removing key words from input query you can setup lists of nose/stop words inside sql server.
Here is one feature I want to see here on StackOverflow as well as on any other site that provides search functionality:
give more priority(weight) to some fields of your documents
(in case of stackoverflow - search should prioritize topic title)
Also consider to use 3rd party solution for FTS such as Lucene or Sphinx - they can provide much better user experience than build-in functionality.
Some advantages of 3rd party FTS components are: reduced database load, better relevance of search results, better indexing speed, smaller size of database.

Categories

Resources