I'm building a portal, which isn't a blogging engine, but is quite similar to one. In SQL Server, I have a database containing a table that is the basis for the "posts". This Posts table includes the following columns:
ID
Author(s)
Tags
Title
Markdown post content
This is my first time building such a portal, and I'd like to implement some sort of ASP.NET search over these rows, preferably using all of the properties (columns), except for the ID one. Also, in the long run, I'm considering the possibility of implementing a search of this and comments to those posts, which would be stored in a different table.
Are there any open-source implementations or example code online for accomplishing this search? If not, how can I get started? Could you point me towards some tutorials w/ sample code on how to accomplish this with ASP.NET and C#? Also, has Google (or some other company) created any things for this?
I hope my question isn't too broad or vague. Thanks in advance!
Are you using Sql Server 2008? If so, you could leverage the full-text search features built right into it. Then it would be as simple as passing the user's (sanitized) input into a sql query.
For example, here's a query that would search the Author, Title and PostContent fields for the user's inputted text.
SELECT Author, Title FROM Posts
WHERE CONTAINS((Author, Title, PostContent), #userInput);
SQL Server 2008 supports different search methods too, like simple token, weighted word values, synonym, proximity and prefix searches... it's pretty awesome.
Have you thought of implementing search through the use of Full-Text Search? It seems like a great scenario for it. This link might provide useful information on architecture and development of Full-Text Search. http://msdn.microsoft.com/en-us/library/ms142571.aspx
If this is publicly exposed, you can always have google index it. Google has an appliance which you can purchase to do this as well.
If you dont want to roll your own you could look at:
Sharepoint (regular [comes with Win 2003+] or portal server)
SOLR (Apache Lucene project)
If you want to roll your own, I suggest looking into SQL Server Analysis Services to build the search indexes for you on a regular basis.
I agree with Womp - Full text index is the way to go. You could also look at NLucene if you need to index more than just the database table.
i think you should try to use Lucene which you can use to index all your data and build a really good search engine.
i can give more information about that if you like.
Related
I am doing a school project in C# apps and I decided to create a ticketing system.
I want to impress my teacher (^^) so I decided to add a database for my app.
I have a month to do this so i think I can learn it since I don't have any prior experience with databases.
Could you tell me how to do it? Below is my app, I want to send the info in the TextBox to a database
I already followed the instructions in MSDN which basically tells you how to add a data source in your app. I added northwind dataset to my app, but I don't know what to do with it and how will it be useful with my app...
For a SQL backend, you can use SQLite quite easily. SQLite is simply a file that resides on the local system, so it is totally portable/deployable with your application. It comes with the caveat that the database is not shared between users. It is a single user database. Two people running an application based on SQLite will not share data. For a uni assignment, this is probably not going to be a big deal.
You could also use SQL Server CE (compact edition), which is a stripped down SQL Server implementation which is similar to SQLite (local, embedded, single user). This will allow you to use Visual Studio database tools to design your database.
Once you have a database embedded within your application, you need to design a schema to hold on to this information. If your screenshot is the only data you need to save, a table like the following should do the trick:
TABLE PERSON
COLUMN name varchar(100)
COLUMN address varchar(200)
COLUMN email varchar(100)
COLUMN mobile varchar(15)
You will need to investigate how to create tables in SQL. That should guide you in what you need though. Visual Studio (some versions), also have a database browser/designer.
Then you need to decide how you want to communicate with the database. You have several options.
Linq 2 SQL
Entity
DataTables
Scott Gu has an excellent series on how to use Linq 2 SQL which I would highly recommend reading. It will go the majority of the way to helping you get to where you need.
So now that you have a SQL database and a provider, you can start trying to wire up the database to the form. This is where databinding comes in. You can drag a Data Source onto a form (which is your Person table), and wire up the table to your text fields. There are many examples on the net how to do this.
If you want to take it a step further, look into the ErrorProvider control. It will allow you to bind validation to your data source and text fields. Once again, a few google searches should point you in the right direction.
I haven't provided code samples because this is homework. If you want to impress your teacher, you will do so by truly understanding the technology you're trying to use. These are just pointers in the right direction so you know what it is you can investigate. Best of luck.
That is a pretty broad question, is there something specifically that you need help with? Like connect to the database, use a datareader, etc...?
If you want to impress your teacher, don't refer to MSDN. Use something like couchdb. Don't get caught up in the "prescribed " .net ecosystem.
I am new in the game for open source.
had a question, before i dive into what i plan to do. Assuming I plan to use c# , with no NoSQL (not planned which one (RavenDb, or MongoDb)), I wanted to do indexing for a site in asp.net.
I would like to use Lucene.net for indexing data and page links on my site, When do you actually tell Lucene.Net to start indexing?
I mean, is it a background process that starts indexing every night, just like the SharePoint indexes or the moment you call insert to nosql at the time you should call to index a record.
How about links on pages, when should the crawl engine run. I guess I am thinking in terms of SharePoint world and needs to be corrected by some people on this board.
I am particularly interested in sequence of steps, I am sorry, i am failing to understand when and why.
Any explanation or links to examples would help.
Appreciate your help.
Thanks
Sweety
Lucene is a search engine, not a crawler. So you would need to find a crawler which inserts the data into the Lucene index.
Think of Lucene as a SQL server. It can store data and retrieve data based on queries. But you have to create the application which actually inserts and queries the data.
You could very well use Solr (built on top of Lucene) and Nutch, both java projects, and use web services to between your C# app and the search index. The java version of Lucene is also under constant development, while the .Net version is somewhat up in the air.
I have a little over 1 million records in my lucene database and would like to move them into a new database so I can more easily do advanced searching and join it with my existing tables etc. I have done some searching and haven't found a good/fast way to take my existing lucene database files and move them into a sql database.
Any help would be appreciated or pointing me in the right direction.
Details: My sql database is Microsoft SQL Server Management Studio. My application which creates the lucene database is a web scraper writing in c#
EDIT: I am using Lucene.net
Not the answer you're looking for, but I'd just like to point out that an index and a relational database are two vastly different things. Unless you're storing all the data in the index as well, I really don't think what you're trying to do is possible.
Putting your Lucene index in DB negates the purpose of indexing. The main advantage of Lucene is extremely fast, relevant searches over huge amount of text. Instead of putting index into the DB you might as well just use MSSQL Server full text search instead.
I think you should consider your requirements once again and either switch to MSSQL full text search or use standard Lucene searching mechanisms.
I'm working on some stuff for an in-house CRM. The company's current frontend allows for lots of duplicates. I'm trying to stop end-users from putting in the same person because they searched for 'Bill Johnson' and not 'William Johnson.' So the user will put in some information about their new customer and we'll find the similar names (including fuzzy names) and match them against what is already in our database and ask if they meant those things... Does such a database or technology exist?
I implemented such a functionality on one website. I use double_metaphone() + levenstein() in PHP. I precalculate a double_metaphone() for each entry in the dabatase, which I lookup using a SELECT of the first x chars of the 'metaphoned' searched term.
Then I sort the returned result according to their levenstein distance. double_metaphone() is not part of any PHP library (last time I checked), so I borrowed a PHP implementation I found somewhere a long while ago on the net (site no longer on line). I should post it somewhere I suppose.
EDIT: The website is still in archive.org:
http://web.archive.org/web/20080728063208/http://swoodbridge.com/DoubleMetaPhone/
or Google cache:
http://webcache.googleusercontent.com/search?q=cache:Tr9taWl9hMIJ:swoodbridge.com/DoubleMetaPhone/+Stephen+Woodbridge+double_metaphon
which leads to many other useful links with source code for double_metaphone(), including one in Javascript on github: http://github.com/maritz/js-double-metaphone
EDIT: Went through my old code, and here are roughly the steps of what I do, pseudo coded to keep it clear:
1) Precompute a double_metaphone() for every word in the database, i.e., $word='blahblah'; $soundslike=double_metaphone($word);
2) At lookup time, $word is fuzzy-searched against the database: $soundslike = double_metaphone($word)
4) SELECT * FROM table WHERE soundlike LIKE $soundlike (if you have levenstein stored as a procedure, much better: SELECT * FROM table WHERE levenstein(soundlike,$soundlike) < mythreshold ORDER BY levenstein(word,$word) ASC LIMIT ... etc.
It has worked well for me, although I can't use a stored procedure, since I have no control over the server and it's using MySQL 4.20 or something.
I asked a similar question once. Name Hypocorism List I never did get around to doing anything with it but the problem has come up again at work so I might write and open source a library in .net for doing some matching.
Update:
I ported the perl module I mentioned there to C# and put it up on github. http://github.com/stimms/Nicknames
Implement the Levenshtein distance:
http://en.wikipedia.org/wiki/Levenshtein_distance
This can be written as a SQL Function and queried many different ways.
Well SSIS has some fuzzy logic tasks we use to find duplicates after the fact.
I think though you need to have your logic look at more than just the name for best results. If they are putting in address, email or phone information, perhaps you could look for people with the same last name with one or more of those other matches and ask if one of them will do. You could also make a table of nicknames for various names and match on that. You won't get all of them, but you could get some of the most common in your country at least.
You can use SOUNDEX to get similar sounding names. However, it won't match with William and Bill for example.
Try this in SQL as an example.
SELECT SOUNDEX('John'), SOUNDEX('Jon')
There is some built-in SOUNDS LIKE functionality in SQL Server, see SOUNDEX http://msdn.microsoft.com/en-us/library/aa259235%28SQL.80%29.aspx
As for full / nickname searching there isn't anything built it that I am aware of. Nicknames vary by region and it's a lot of information to keep track of. There might be a database linking full names to nicknames that you could leverage in your own application.
My site has a database of widgets, each widget has a name and description. I want to implement a fast and relevant search system on my site, how do I do this?
As a side note, I also implemented tags on those widget. Here is how I implemented (Tell me what you think):
Table widgets: has unique id plus other info for a widget
Table tag_words: Has unique ID and of course, the tag
Table tag-widget: Join table that associates a tag to a widget
I've created indexes on both all those columns to make search as fast as possible.
If you're using SQL Server, you can utilize full-text search. As for the tags, that's the best way for a small to medium site. StackOverflow denormalizes the tagging for faster reads, but for your app I would probably avoid that.
Lucene.NET is an alternative if you can't use the built-in full-text search from your database. It also gives you better control over what and how things are indexed.
Reading your side note about tags, I think I would really use Lucene.NET. It will give allow you to decide how to rank the tags in relation to everything else whereas I don't know if that is possible with SQL Server's full text search.