Lucene Searcher return only one match result - c#

My search text goes as "ma" and i have two lucene document which have ma as the text in it. But in return i only get one document.
Below is the code :
//adding deocument
document.Add(new Field("Text",text,Field.Store.YES, Field.Index.TOKENIZED));
//search logic :
IndexReader reader = IndexReader.Open(GetFileInfo(indexName));
//create an index searcher that will perform the search
IndexSearcher searcher = new IndexSearcher(reader);
//List of ID
List<string> searchResultID = new List<string>();
//build a query object
QueryParser parser = new QueryParser("Text", analyzer);
parser.SetAllowLeadingWildcard(true);
Query query = parser.Parse(searchText);
//execute the query
Hits hits = searcher.Search(query);

Maybe you could use luke. It's a useful diagnostic tool that can display the contents of an existing Lucene index and do other interesting stuff. I haven't used it myself, so I'm not sure, but I think it might help you in debugging this issue. Good luck!

I was able to solve my issue :
Index Writer must be created only once.You can check whether the index exits or not if not you create an new IndexWriter . for eg :
//The last parameter bool of an IndexWriter Contructor which says that you want to create an newIndexWriter or not
IndexWriter writer = new IndexWriter(GetFileInfo(indexName), analyzer, true);
On adding the new Document you must perform an check whether index exists or not , if it exists , then just pass bool param as false to the IndexWriter constructor:
IndexWriter writer = new IndexWriter(GetFileInfo(indexName), analyzer, false);
writer.AddDocument(CreateDocument(Id, text, dateTime));
writer.Optimize();
writer.Close();

Related

Lucene.NET TextField not being indexed

Using .NET 6.0 and Lucene.NET-4.8.0-beta00016 from NuGet
I am having an issue implementing the quickstart example from the website. When using TextField in a document, the field is not indexed. The search later in the BuildIndex method retrieves no results. If TextField is changed to StringField, the example works and the search returns a valid result.
Why does StringField work and TextField doesn't? I read that StringField is not analyzed but TextField is, so perhaps it's something to do with the StandardAnalyzer?
public class LuceneFullTextSearchService {
private readonly IndexWriter _writer;
private readonly Analyzer _standardAnalyzer;
public LuceneFullTextSearchService(string indexName)
{
// Compatibility version
const LuceneVersion luceneVersion = LuceneVersion.LUCENE_48;
string indexPath = Path.Combine(Environment.CurrentDirectory, indexName);
Directory indexDir = FSDirectory.Open(indexPath);
// Create an analyzer to process the text
_standardAnalyzer = new StandardAnalyzer(luceneVersion);
// Create an index writer
IndexWriterConfig indexConfig = new IndexWriterConfig(luceneVersion, _standardAnalyzer)
{
OpenMode = OpenMode.CREATE_OR_APPEND,
};
_writer = new IndexWriter(indexDir, indexConfig);
}
public void BuildIndex(string searchPath)
{
Document doc = new Document();
TextField docText = new TextField("title", "Apache", Field.Store.YES);
doc.Add(docText);
_writer.AddDocument(doc);
//Flush and commit the index data to the directory
_writer.Commit();
// Parse the user's query text
Query query = new TermQuery(new Term("title", "Apache"));
// Search
using DirectoryReader reader = _writer.GetReader(applyAllDeletes: true);
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs topDocs = searcher.Search(query, n: 2);
// Show results
Document resultDoc = searcher.Doc(topDocs.ScoreDocs[0].Doc);
string title = resultDoc.Get("title");
}
}
StandardAnalyzer includes a LowerCaseFilter, so your text is stored in the index as lower-case.
However, when you build your query, the text you use is "Apache" rather than "apache", so it doesn't produce any hits.
// Parse the user's query text
Query query = new TermQuery(new Term("title", "Apache"));
Option 1
Lowercase your search term.
// Parse the user's query text
Query query = new TermQuery(new Term("title", "Apache".ToLowerInvariant()));
Option 2
Use a QueryParser with the same analyzer you use to build the index.
QueryParser parser = new QueryParser(luceneVersion, "title", _standardAnalyzer);
Query query = parser.Parse("Apache");
The Lucene.Net.QueryParser package contains several implementations (the above example uses the Lucene.Net.QueryParsers.Classic.QueryParser).

Unable to delete the document from Lucene index

I'm new to Lucene.net. I've been trying to delete a document from Lucene index file. But unfortunately I couldn't get it done.
Here is my code.
public void DeleteDocuments()
{
Term term = new Term("id", id);
Analyzer analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_30);
Directory directory = FSDirectory.Open(filePath);
IndexWriter writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
writer.DeleteDocuments(term);
writer.Optimize();
writer.Commit();
writer.Dispose();
}
Unable to delete the document from the index file. Please help me out.
Thanks in advance.
Your problem might be the Analyzer you are using for the "id" field.
If your id includes letters, the StandardAnalyzer will index them in lower case. This would mean that the term you are creating might have a different value than was indexed "ABC" vs "abc".
Fo id type fields you should ensure that the field is created with no analysis.

Unable to get the searched document using Lucene.net

I'm new to Lucene.net. I've a situation where I need to search the all the documents in a folder for a keyword that has been entered by the user.
I've indexed all the files in the folder and prepared a query for the keywords entered by the user and performed searching.
The problem is I could get the hits and when I tried to iterate the hits, I couldn't get the fields from the documents of the hits.
Here is my code.
public void Searching()
{
Analyzer analyzer = new StandardAnalyzer(luceneVersion.Version.LUCENE_29);
QueryParser parser = new QueryParser(luceneVersion.Version.LUCENE_29, "content", analyzer);
Query query = parser.Parse(txtSearchText.Text);
Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(txtIndexPath.Text.Trim()));
Searcher searcher = new IndexSearcher(IndexReader.Open(directory, true));
TopScoreDocCollector collector = TopScoreDocCollector.Create(100, true):
searcher.Search(query, collector);
ScoreDoc [] hits = collector.TopDocs(). ScoreDocs;
foreach (ScoreDoc hit in hits)
{
int id = hit.Doc;
float score = hit.Score;
Document doc = searcher.Doc(id);
string content = doc.Get("content"); // null
}
}
When tried to debug, the content I'm getting is null, empty.
Am I missing anything in my code, this is literally bogging me since half day all the way. Please help me out.
Thanks in advance.
I've been trying this everything whatever I could do. The problem is I've been indexing without storing the id field of the document in the index file.
Here was the code I've used while indexing.
doc.Add(new Field("id", id, Field.Store.NO, Field.Index.ANALYZED);
While it should be like the following, so that it will be available in the index file.
doc.Add(new Field("id", id, Field.Store.YES, Field.Index.ANALYZED);

Building Examine (lucene.net) index with comma separated list of IDs

I have an Umbraco website that is using Examine search which is based on lucene.net. I am pretty much trying to do exactly what is described in the following article:
Querying against a comma separated list of IDs with Examine and Lucene.Net?
The problem I have is when I am trying to create the index using the following code:
// Loop through articles
foreach (var a in articles)
{
yield return new SimpleDataSet()
{
NodeDefinition = new Examine.IndexedNode()
{
NodeId = a.Id,
Type = "Article"
},
RowData = new Dictionary<string, string>()
{
{"Name", a.Name},
{"Url", a.NiceUrl},
{"Category", "1234"},
{"Category", "5678"}
}
};
}
I am receiving the following error:
An item with the same key has already been added.
Does anyone know how I can get around this issue?
The next version of Examine (v2) will support this properly, with any luck that might be out within a couple months but that's really just dependent on how much time we get.
In the meantime, you could use the DocumentWriting event on your indexer which gives you direct access to the Lucene Document, then you can index however you like. So you could initially have a comma separated list of ids for your categories and during this event you could split them and add them as individual values in Lucene.
The error you are seeing is a restriction of .NET's Dictionary<TKey, TValue> class as mentioned by #DavidH. The restriction is inherited from Examine's SimpleDataSet class, which by looking at the source only allows Dictionary<string, string> as a way of adding row data to a document.
However, a Lucene Document does allow you to add multiple fields with the same name as mentioned on the linked question:
using Lucene.Net.Documents;
var document = new Document();
document.Add(CreateField("Id", a.Id));
document.Add(CreateField("Name", a.Name));
document.Add(CreateField("Url", a.NiceUrl));
document.Add(CreateField("Category", "1234"));
document.Add(CreateField("Category", "5678"));
...
private Field CreateField(string fieldName, string fieldValue)
{
return new Field(
fieldName,
fieldValue,
Field.Store.YES,
Field.Index.ANALYZED);
}
Although not as convenient as Examine's API, using Lucene natively is a lot more flexible for these scenarios.
Here is a full example of doing it in lucene, however as said Examine seems to limit the flexiblity by having input in a Dictionary. However changing examine to handle it should be simple.
public static void Main (string[] args)
{
Analyzer analyser = new StandardAnalyzer (Lucene.Net.Util.Version.LUCENE_CURRENT);
Directory dir = new RAMDirectory ();
using (IndexWriter iw = new IndexWriter (dir, analyser, Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED)) {
Document doc1 = new Document ();
doc1.Add (new Field("title", "multivalued", Field.Store.YES, Field.Index.ANALYZED));
doc1.Add (new Field("multival", "val1", Field.Store.YES, Field.Index.ANALYZED));
doc1.Add (new Field("multival", "val2", Field.Store.YES, Field.Index.ANALYZED));
iw.AddDocument (doc1);
Document doc2 = new Document ();
doc2.Add (new Field("title", "singlevalued", Field.Store.YES, Field.Index.ANALYZED));
doc2.Add (new Field("multival", "val1", Field.Store.YES, Field.Index.ANALYZED));
iw.AddDocument (doc2);
}
using (Searcher searcher = new IndexSearcher (dir, true)) {
var q1 = new TermQuery (new Term ("multival", "val1"));
var q1result = searcher.Search (q1, 1000);
//Will print "Found 2 documents"
Console.WriteLine ("Found {0} documents", q1result.TotalHits);
var q2 = new TermQuery (new Term ("multival", "val2"));
var q2result = searcher.Search (q2, 1000);
//Will print "Found 1 documents"
Console.WriteLine ("Found {0} documents", q2result.TotalHits);
}
}
The dictionary keys must be unique, and this is not specific to Lucene but instead to the .NET Dictionary<TKey, TValue> class. One possible option is to pipe delimit the values under one "Category" dictionary key, and then split on the pipe character to parse them out:
RowData = new Dictionary<string, string>()
{
{"Name", a.Name},
{"Url", a.NiceUrl},
{"Category", "1234|5678"}
}
You could then use string.Split on the pipe character '|' to parse them back out.

Why is the Lucene.NET IndexSearcher returning zero results?

I recently started working with Lucene.NET and I have some problems: I have used an IndexWriter to index my documents in C:\\TestIndex which I guess it worked since it generated several .fnm, .frq, .cfx, .tii, .tis files.
The problem is when trying to make a simple search through them, I never get any results back. Below is the code I use,
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
//Provide the directory where index is stored
Directory directory = FSDirectory.Open(newSystem.IO.DirectoryInfo(#"C:\\TestIndex"));
IndexReader indexReader = IndexReader.Open(directory, true);
Searcher indexSearch = new IndexSearcher(indexReader);
Analyzer std = new StandardAnalyzer(Version.LUCENE_29);
QueryParser parser = new QueryParser(Version.LUCENE_29, "text", std);
Query qry = parser.Parse("morning");
// true opens the index in read only mode
Searcher srchr = new IndexSearcher(IndexReader.Open(directory, true));
TopScoreDocCollector cllctr = TopScoreDocCollector.Create(100, true);
ScoreDoc[] hits = cllctr.TopDocs().ScoreDocs;
srchr.Search(qry, cllctr);
for (int i = 0; i < hits.Length; i++)
{
int docId = hits[i].Doc;
float score = hits[i].Score;
Document doc = srchr.Doc(docId);
Console.WriteLine("Searched from Text: " + doc.Get("text"));
}
I tried several approaches but I never get any result. Do you have any idea?
Below is indexing code,
IndexWriter indexWriter =
new IndexWriter(
luceneDir,
new StandardAnalyzer(Version.LUCENE_29),
true,
IndexWriter.MaxFieldLength.UNLIMITED);
string[] listOfFiles = Directory.GetFiles(#"C:\Projects\lucene.net-trunk\build\vs2010\demo\MyTestProject\TestDocs");
foreach (string s in listOfFiles)
{
String content = File.ReadAllText(s);
Document doc = new Document();
String title = s;
// adding title field
doc.Add(new Field("title", title, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("content", content, Field.Store.YES, Field.Index.ANALYZED));
indexWriter.AddDocument(doc);
}
indexWriter.Optimize();
indexWriter.Dispose();
Use luke to inspect the index to ensure it has data also you can perform searches to validate your search criteria
http://www.getopt.org/luke/
EDIT - (Luke will work with lucene and lucene.net indexes you will need to install java to use)
EDIT
Update the line
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", std);
With
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "content", std);
You have set the default search field to text which doesn't exist
Also you are trying to fetch the wrong field in your console.write line
Make sure you use the same analyzer when indexing and searching (in your case it's StandardAnalyzer I guess):
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Store;
...
Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(#"C:\\TestIndex"));
var writer = new IndexWriter(
directory,
new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29),
true,
new MaxFieldLength(int.MaxValue));
UPDATE
I'm using a slightly different approach for searching but, anyway, maybe you need to swap these two lines:
ScoreDoc[] hits = cllctr.TopDocs().ScoreDocs;
srchr.Search(qry, cllctr);
So it becomes:
srchr.Search(qry, cllctr);
ScoreDoc[] hits = cllctr.TopDocs().ScoreDocs;
meaning that the collector first collects the results when the search is executed and then you get your scored documents via the collector instance.
Could you try explicitely specifying the field you're searching? for example:
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", std);
Lucene.Net.Search.Query qry = parser.Parse("content: morning");
I think that Lucene requires you to tell it on which field(s) (title, content...) you want to run your query.

Categories

Resources