How to Move Lucene Index results into SQL Server Database - c#

I have a little over 1 million records in my lucene database and would like to move them into a new database so I can more easily do advanced searching and join it with my existing tables etc. I have done some searching and haven't found a good/fast way to take my existing lucene database files and move them into a sql database.
Any help would be appreciated or pointing me in the right direction.
Details: My sql database is Microsoft SQL Server Management Studio. My application which creates the lucene database is a web scraper writing in c#
EDIT: I am using Lucene.net

Not the answer you're looking for, but I'd just like to point out that an index and a relational database are two vastly different things. Unless you're storing all the data in the index as well, I really don't think what you're trying to do is possible.

Putting your Lucene index in DB negates the purpose of indexing. The main advantage of Lucene is extremely fast, relevant searches over huge amount of text. Instead of putting index into the DB you might as well just use MSSQL Server full text search instead.
I think you should consider your requirements once again and either switch to MSSQL full text search or use standard Lucene searching mechanisms.

Related

How to use entity framework with elastic search

I want to use on entity framework with elastic search.
I saw this article MVC APPLICATION WITH ENTITY FRAMEWORK AND ELASTICSEARCH.
But as I understood, I need 2 DB (ms sql+ elastic) and there they explain how to translate the data.
The ms sql I will save the data and I done the search on the elastic .
So all the data will be twice so it will be waste of storage...
Is there any direct way to do that?
thanks
You can use entity framework with elastic search by utilising ElasticsearchCrud api.
This article clearly explains the steps to do so.
P.S: I rather not to copy/paste the steps here as it might look redundant.
Yes you understood right you would need to use two different sources.
There is no direct way to use elasticsearch with EF, you would need to write you custom logic to fit Database and Elasticsearch together.
If you ask why? Answer is that Database and Elasticsearch are different.
First of all Elastic is document database and you should save whole object while in database you can split items to multiple tables in ES "preferable" to save as a one document (Still in ES you can use nested objects but you will not be able to join).
Secondly search queries are totally different in SQl and Elastic. So sometimes only you would decide which source should be used to search. To search Elastic you can use NEST package but you would need to learn ES queries and indexing part since depends on analysis you will have defferent results.

Indexing sequence steps for open source Lucene, working with SQL or Nosql

I am new in the game for open source.
had a question, before i dive into what i plan to do. Assuming I plan to use c# , with no NoSQL (not planned which one (RavenDb, or MongoDb)), I wanted to do indexing for a site in asp.net.
I would like to use Lucene.net for indexing data and page links on my site, When do you actually tell Lucene.Net to start indexing?
I mean, is it a background process that starts indexing every night, just like the SharePoint indexes or the moment you call insert to nosql at the time you should call to index a record.
How about links on pages, when should the crawl engine run. I guess I am thinking in terms of SharePoint world and needs to be corrected by some people on this board.
I am particularly interested in sequence of steps, I am sorry, i am failing to understand when and why.
Any explanation or links to examples would help.
Appreciate your help.
Thanks
Sweety
Lucene is a search engine, not a crawler. So you would need to find a crawler which inserts the data into the Lucene index.
Think of Lucene as a SQL server. It can store data and retrieve data based on queries. But you have to create the application which actually inserts and queries the data.
You could very well use Solr (built on top of Lucene) and Nutch, both java projects, and use web services to between your C# app and the search index. The java version of Lucene is also under constant development, while the .Net version is somewhat up in the air.

ASP.NET C# Search in a SQL Server Database Table

I'm building a portal, which isn't a blogging engine, but is quite similar to one. In SQL Server, I have a database containing a table that is the basis for the "posts". This Posts table includes the following columns:
ID
Author(s)
Tags
Title
Markdown post content
This is my first time building such a portal, and I'd like to implement some sort of ASP.NET search over these rows, preferably using all of the properties (columns), except for the ID one. Also, in the long run, I'm considering the possibility of implementing a search of this and comments to those posts, which would be stored in a different table.
Are there any open-source implementations or example code online for accomplishing this search? If not, how can I get started? Could you point me towards some tutorials w/ sample code on how to accomplish this with ASP.NET and C#? Also, has Google (or some other company) created any things for this?
I hope my question isn't too broad or vague. Thanks in advance!
Are you using Sql Server 2008? If so, you could leverage the full-text search features built right into it. Then it would be as simple as passing the user's (sanitized) input into a sql query.
For example, here's a query that would search the Author, Title and PostContent fields for the user's inputted text.
SELECT Author, Title FROM Posts
WHERE CONTAINS((Author, Title, PostContent), #userInput);
SQL Server 2008 supports different search methods too, like simple token, weighted word values, synonym, proximity and prefix searches... it's pretty awesome.
Have you thought of implementing search through the use of Full-Text Search? It seems like a great scenario for it. This link might provide useful information on architecture and development of Full-Text Search. http://msdn.microsoft.com/en-us/library/ms142571.aspx
If this is publicly exposed, you can always have google index it. Google has an appliance which you can purchase to do this as well.
If you dont want to roll your own you could look at:
Sharepoint (regular [comes with Win 2003+] or portal server)
SOLR (Apache Lucene project)
If you want to roll your own, I suggest looking into SQL Server Analysis Services to build the search indexes for you on a regular basis.
I agree with Womp - Full text index is the way to go. You could also look at NLucene if you need to index more than just the database table.
i think you should try to use Lucene which you can use to index all your data and build a really good search engine.
i can give more information about that if you like.

How can I do search efficiently data in Database except using fullsearch

I want to search a sentence (word combination of) in some table or view of DB. I dont want to use Fultext search property of DB. Is there any alternative efficient way?
Without the use of an index, a database has to perform a "full table scan". This is rather like you looking through a book one page at a time to find what you need.
That being said, computers are a lot faster than humans. It really depends on how much load your system has. Using MySQL we successfully implemented a search system on a table of lead information. The nature of the problem was one that could not be solved by normal indexes (including full text). So we designed it to be powered using a full table scan.
That involved creating tables as narrow as possible with the search data, and joining them to a larger table with related, but non-search data.
At the time (4 years ago), 100,000 records could be scanned in .06 seconds. 1,000,000 records took about .6 seconds. The system is still in heavy production use with millions of records.
If your data needs exceed 6 digits of records, you may want to re-evaluate using a full text index, or do some research on inverted indexes.
Please comment if you would like any more info.
Edit: The search tables were kept as narrow as possible. Ideally 50-100 bytes per record. ENUMS and TINYINT are great space savers if you can use them to "map" to string values another way.
The search queries were generated using a PHP class. They were simply:
-- DataTable is the big table that holds all of the data
-- SearchTable is the narrow table that holds the bits of searchable data
SELECT
MainTable.ID,
MainTable.Name,
MainTable.Whatever
FROM
MainTable, SearchTable
WHERE
MainTable.ID = SearchTable.ID
AND SearchTable.State IN ('PA', 'DE')
AND SearchTable.Age < 40
AND SearchTable.Status = 3
Essentially, the two tables were joined on a primary key (fast), and the filtering was done by full table scan on the SearchTable (pretty fast). We were using MySQL.
We found that by having the record format == "FIXED" in the MyISAM tables, we could increase performace by 3x. This meant no blobs, no varchars, etc...
Let me know if this helps.
None as efficient as Fulltext search.
Basically it boils down to where with like derivatives and since indexes are tossed away in most of the scenarios , it becomes a very expensive query.
If you are using JAVA have at look at Lucene
If you are using .net, you can have a look at Lucene.net, it will minimize the calls to the database for the search queries.
Following from http://incubator.apache.org/lucene.net/
Lucene.Net is a source code,
class-per-class, API-per-API and
algorithmatic port of the Java Lucene
search engine to the C# and .NET
platform utilizing Microsoft .NET
Framework.
Lucene.Net sticks to the APIs and
classes used in the original Java
implementation of Lucene. The API
names as well as class names are
preserved with the intention of giving
Lucene.Net the look and feel of the C#
language and the .NET Framework. For
example, the method Hits.length() in
the Java implementation now reads
Hits.Length() in the C# port.
In addition to the APIs and classes
port to C#, the algorithm of Java
Lucene is ported to C# Lucene. This
means an index created with Java
Lucene is back-and-forth compatible
with the C# Lucene; both at reading,
writing and updating. In fact a Lucene
index can be concurrently searched and
updated using Java Lucene and C#
Lucene processes.
You could break up the text into individual words, stick them in a separate table, and use that to find PK IDs that have all the words in your search sentence [i.e. but not necessarily in the right order], and then search just those rows for the sentence. Should avoid having to do a table scan every time.
Please ask if you need me to explain further

Create a Search Engine with SQL 2000 and ASP.NET C#

I am looking to create a search engine that will be based on 5 columns in a SQL 2000 DB. I have looked into Lucene.NET and read the documentation on it, but wondering if anyone has any previous experience with this?
Thanks
IMHO it's not so much about performance, but about maintainability. In order to index your content using Lucene.NET you'll have to create some mechanism (service of triggered) which will add new rows (and remove deleted rows) from the Lucene index.
From a beginner's perspective I think it's probably easier to use the SQL Server built-in full text search engine.
i haven't dealt with Lucene yet but a friend of mine has and he said that their performance was 4 to 5 times better with lucene than full text indexing.
Performance better? I think that largely depends on volume and how you expect the data to scale.
SQL Server Full Text is far superior in my opinion. To get this to work with lucene you will need a process to maintain the index by extracting data from the SQL database.
You cam either use a Lucene Index or SQL FTS Index. I personally lean toward Lucene from a simplicity standpoint. It is also not a black box. Alot of which solution will work (and they both may work) depnds on query load, data size and data update frequency. Lucene does provide a well worn path to building very scalable search solutions for websites. In the future please include some more information about your problem.

Categories

Resources