I am doing a project on search using Lucene.Net. We have created an index which contains 100 000 documents with 5 fields. But while searching I'm unable to track my correct record. Can anybody help me? Why is that so?
My code looks like this
List<int> ids = new List<int>();
List<Hits> hitList = new List<Hits>();
List<Document> results = new List<Document>();
int startPage = (pageIndex.Value - 1) * pageSize.Value;
string indexFileLocation = #"c:\\ResourceIndex\\"; //Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData), "ResourceIndex");
var fsDirectory = FSDirectory.Open(new DirectoryInfo(indexFileLocation));
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexReader indexReader = IndexReader.Open(fsDirectory, true);
Searcher indexSearch = new IndexSearcher(indexReader);
//ids.AddRange(this.SearchPredicates(indexSearch, startPage, pageSize, query));
/*Searching From the ResourceIndex*/
Query resourceQuery = MultiFieldQueryParser.Parse(Lucene.Net.Util.Version.LUCENE_29,
new string[] { productId.ToString(), languagelds, query },
new string[] { "productId", "resourceLanguageIds", "externalIdentifier" },
analyzer);
TermQuery descriptionQuery = new TermQuery(new Term("description", '"'+query+'"'));
//TermQuery identifierQuery = new TermQuery(new Term("externalIdentifier", query));
BooleanQuery filterQuery = new BooleanQuery();
filterQuery.Add(descriptionQuery, BooleanClause.Occur.MUST);
//filterQuery.Add(identifierQuery,BooleanClause.Occur.MUST_NOT);
Filter filter = new CachingWrapperFilter(new QueryWrapperFilter(filterQuery));
TopScoreDocCollector collector = TopScoreDocCollector.create(100, true);
//Hits resourceHit = indexSearch.Search(resourceQuery, filter);
indexSearch.Search(resourceQuery, filter, collector);
ScoreDoc[] hits = collector.TopDocs().scoreDocs;
//for (int i = startPage; i <= pageSize && i < resourceHit.Length(); i++)
//{
// ids.Add(Convert.ToInt32(resourceHit.Doc(i).GetField("id")));
//}
for (int i = 0; i < hits.Length; i++)
{
int docId = hits[i].doc;
float score = hits[i].score;
Lucene.Net.Documents.Document doc = indexSearch.Doc(docId);
string result = "Score: " + score.ToString() +
" Field: " + doc.Get("id");
}
You're calling Document.Get("id"), which returns the value of a stored field. It wont work without Field.Store.YES when indexing.
You could use the FieldCache if you've got the field indexed without analyzing (Field.Index.NOT_ANALYZED) or using the KeywordAnalyzer. (Meaning one term per field and document.)
You'll need to use the innermost reader for the FieldCache to work optimally. Here's a code paste from FieldCache with frequently updated index which uses the FieldCache in a proper way, reading an integer value from the id field.
// Demo values, use existing code somewhere here.
var directory = FSDirectory.Open(new DirectoryInfo("index"));
var reader = IndexReader.Open(directory, readOnly: true);
var documentId = 1337;
// Grab all subreaders.
var subReaders = new List<IndexReader>();
ReaderUtil.GatherSubReaders(subReaders, reader);
// Loop through all subreaders. While subReaderId is higher than the
// maximum document id in the subreader, go to next.
var subReaderId = documentId;
var subReader = subReaders.First(sub => {
if (sub.MaxDoc() < subReaderId) {
subReaderId -= sub.MaxDoc();
return false;
}
return true;
});
var values = FieldCache_Fields.DEFAULT.GetInts(subReader, "id");
var value = values[subReaderId];
Related
I have difficulties understanding this example on how to use facets :
https://lucenenet.apache.org/docs/4.8.0-beta00008/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html
My goal is to create an index in which each document field have a facet, so that at search time i can choose which facets use to navigate data.
What i am confused about is setup of facets in index creation, to
summarize my question : is index with facets compatibile with
ReferenceManager?
Need DirectoryTaxonomyWriter to be actually written and persisted
on disk or it will embedded into the index itself and is just
temporary? I mean given the code
indexWriter.AddDocument(config.Build(taxoWriter, doc)); of the
example i expect it's temporary and will be embedded into the index (but then the example also show you need the Taxonomy to drill down facet). So can the Taxonomy be tangled in some way with the index so that the are handled althogeter with ReferenceManager?
If is not may i just use the same folder i use for storing index?
Here is a more detailed list of point that confuse me :
In my scenario i am indexing the document asyncrhonously (background process) and then fetching the indext ASAP throught ReferenceManager in ASP.NET application. I hope this way to fetch the index is compatibile with DirectoryTaxonomyWriter needed by facets.
Then i modified the code i write introducing the taxonomy writer as indicated in the example, but i am a bit confused, seems like i can't store DirectoryTaxonomyWriter into the same folder of index because the folder is locked, need i to persist it or it will be embedded into the index (so a RAMDirectory is enougth)? if i need to persist it in a different direcotry, can i safely persist it into subdirectory?
Here the code i am actually using :
private static void BuildIndex (IndexEntry entry)
{
string targetFolder = ConfigurationManager.AppSettings["IndexFolder"] ?? string.Empty;
//** LOG
if (System.IO.Directory.Exists(targetFolder) == false)
{
string message = #"Index folder not found";
_fileLogger.Error(message);
_consoleLogger.Error(message);
return;
}
var metadata = JsonConvert.DeserializeObject<IndexMetadata>(File.ReadAllText(entry.MetdataPath) ?? "{}");
string[] header = new string[0];
List<dynamic> csvRecords = new List<dynamic>();
using (var reader = new StreamReader(entry.DataPath))
{
CsvConfiguration csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture);
csvConfiguration.AllowComments = false;
csvConfiguration.CountBytes = false;
csvConfiguration.Delimiter = ",";
csvConfiguration.DetectColumnCountChanges = false;
csvConfiguration.Encoding = Encoding.UTF8;
csvConfiguration.HasHeaderRecord = true;
csvConfiguration.IgnoreBlankLines = true;
csvConfiguration.HeaderValidated = null;
csvConfiguration.MissingFieldFound = null;
csvConfiguration.TrimOptions = CsvHelper.Configuration.TrimOptions.None;
csvConfiguration.BadDataFound = null;
using (var csvReader = new CsvReader(reader, csvConfiguration))
{
csvReader.Read();
csvReader.ReadHeader();
csvReader.Read();
header = csvReader.HeaderRecord;
csvRecords = csvReader.GetRecords<dynamic>().ToList();
}
}
string targetDirectory = Path.Combine(targetFolder, "Index__" + metadata.Boundle + "__" + DateTime.Now.ToString("yyyyMMdd_HHmmss") + "__" + Path.GetRandomFileName().Substring(0, 6));
System.IO.Directory.CreateDirectory(targetDirectory);
//** LOG
{
string message = #"..creating index : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
using (var dir = FSDirectory.Open(targetDirectory))
{
using (DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(dir))
{
Analyzer analyzer = metadata.GetAnalyzer();
var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
using (IndexWriter writer = new IndexWriter(dir, indexConfig))
{
long entryNumber = csvRecords.Count();
long index = 0;
long lastPercentage = 0;
foreach (dynamic csvEntry in csvRecords)
{
Document doc = new Document();
IDictionary<string, object> dynamicCsvEntry = (IDictionary<string, object>)csvEntry;
var indexedMetadataFiled = metadata.IdexedFields;
foreach (string headField in header)
{
if (indexedMetadataFiled.ContainsKey(headField) == false || (indexedMetadataFiled[headField].NeedToBeIndexed == false && indexedMetadataFiled[headField].NeedToBeStored == false))
continue;
var field = new Field(headField,
((string)dynamicCsvEntry[headField] ?? string.Empty).ToLower(),
indexedMetadataFiled[headField].NeedToBeStored ? Field.Store.YES : Field.Store.NO,
indexedMetadataFiled[headField].NeedToBeIndexed ? Field.Index.ANALYZED : Field.Index.NO
);
doc.Add(field);
var facetField = new FacetField(headField, (string)dynamicCsvEntry[headField]);
doc.Add(facetField);
}
long percentage = (long)(((decimal)index / (decimal)entryNumber) * 100m);
if (percentage > lastPercentage && percentage % 10 == 0)
{
_consoleLogger.Information($"..indexing {percentage}%..");
lastPercentage = percentage;
}
writer.AddDocument(doc);
index++;
}
writer.Commit();
}
}
}
//** LOG
{
string message = #"Index Created : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
}
I'm porting application to ASP.Net 5.0 with EF7 and found several problems. One of the issues is MS have dropped DataTable. I'm trying to transfer a bit of code to not use DataTable but read from SQLDataReader and record this into entities I have. I have to read data by columns, but looks like datareader can read only once.
The old code:
Series[] arrSeries = new Series[dt.Columns.Count - 1];
IList<Categories> arrCats = new List<Categories>();
Categories arrCat = new Categories();
foreach (DataColumn dc in dt.Columns)
{
var strarr = dt.Rows.Cast<DataRow>().Select(row => row[dc.Ordinal]).ToList();
if (dc.Ordinal == 0)
{
arrCat.category = strarr.Select(o => new Category { label = o.ToString() }).ToList();
}
else
{
Series s = new Series()
{
seriesname = dc.ColumnName,
renderas = null,
showvalues = false,
data = strarr.Select(o => new SeriesValue { value = o.ToString() }).ToList()
};
arrSeries[dc.Ordinal - 1] = s;
}
}
arrCats.Add(arrCat);
MultiFusionChart fusChart = new MultiFusionChart
{
chart = dictAtts,
categories = arrCats,
dataset = arrSeries
};
return fusChart;
The new code:
Series[] arrSeries = new Series[colColl.Count - 1];
IList<Categories> arrCats = new List<Categories>();
Categories arrCat = new Categories();
arrCat.category = new List<Category>();
for (int i = 0; i < reader.FieldCount; i++)
{
Series s = new Series()
{
seriesname = reader.GetName(i),
renderas = null,
showvalues = false
};
while (reader.Read())
{
if (i == 0)
{
Category cat = new Category();
cat.label = reader.GetValue(i).ToString();
arrCat.category.Add(cat);
}
else
{
SeriesValue sv = new SeriesValue();
sv.value = reader.GetValue(i).ToString();
s.data.Add(sv);
arrSeries[i - 1] = s;
}
}
}
arrCats.Add(arrCat);
MultiFusionChart fusChart = new MultiFusionChart
{
chart = dictAtts,
categories = arrCats,
dataset = arrSeries
};
return fusChart;
Where the code works, it returns null for Series. And I believe this is because reader went to the end while recording Categories. As far as I know it is not possible to reset DataReader?
Is there a way to load column data from DataReader to List? Or maybe there is other way how I can replace DataTable in this example?
You can read value from multiple columns by specifying its index like this
int totalColumns = reader.FieldCount;
for(int i=0;i<totalColumns;i++)
{
var label = reader.GetValue(i);
}
You can select value of all columns by looping through columns.
GetValue takes int as argument which is index of column not the index of row.
The problem is in your for loop. At first when your i is 0, You have initialized a 'Series` object S. After that,
while (reader.Read())
will read your entire Reader while your i will still be zero. SO only the
if(i == 0)
condition will return true and your entire reader will be spent. Afterwards, for i >= 1,
while (reader.Read())
will always return false keeping your array, arrSeries blank. Remove the while loop and directly read value from the dataReader using the index,
reader.GetValue(i);
I am working on implementing Search using Lucene.Net in my E Commerce site. I want to Sort by result depending on user selected criterion. It might be Name,Price,Discount,NewArrivals etc
My Code
var sortBy = new Lucene.Net.Search.Sort(new Lucene.Net.Search.SortField("Name", Lucene.Net.Search.SortField.STRING, true));
//var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "Name", analyzer);
//Query searchQuery = parser.Parse(query);
//TopDocs hits = searcher.Search(searchQuery, int.MaxValue);
//Sort sort=new Sort
//TopDocs hits = searcher.Search(QueryMaker(query, searchfields, storeId), int.MaxValue);
TopFieldDocs hits = searcher.Search(QueryMaker(query, searchfields, storeId),null,int.MaxValue,sortBy);
int results = hits.TotalHits;
Product searchResult = null;
if (results < limit)
limit = results;
//if (limit == 0)
// limit = int.MaxValue;
for (int i = (pagenumber*limit) + 1; i <= ( ( pagenumber * limit ) + limit); i++)
{
try
{
Document doc = searcher.Doc(hits.ScoreDocs[i-1].Doc);
searchResult = new Product();
searchResult.Id = Convert.ToInt32(doc.Get("Id"));
searchResult.Name = doc.Get("Name");
searchResult.SeName = doc.Get("SeName");
searchResult.StoreId = Convert.ToInt32(doc.Get("StoreId"));
searchResults.Add(searchResult);
searchResult = null;
}
catch (Exception ae)
{
}
}
But this doesnt give me expected result and does not sort products using its Name
Any Solutions
the below way i did the search with lucene.net. this routine search multiple word against all index field called "Title", "Description", "Url", "Country" .
i need to know how can i give a condition like where country='UK' or country='US'
i want that multiple word should be search like below but i want to add one more clause that when country is UK. so please guide me what to add in my code.
if (!string.IsNullOrEmpty(multiWordPhrase))
{
string[] fieldList = { "Title", "Description", "Url" };
List<BooleanClause.Occur> occurs = new List<BooleanClause.Occur>();
foreach (string field in fieldList)
{
occurs.Add(BooleanClause.Occur.SHOULD);
}
searcher = new IndexSearcher(_directory, false);
Query qry = MultiFieldQueryParser.Parse(Version.LUCENE_29, multiWordPhrase, fieldList, occurs.ToArray(), new StandardAnalyzer(Version.LUCENE_29));
TopDocs topDocs = searcher.Search(qry, null, ((PageIndex + 1) * PageSize), Sort.RELEVANCE);
ScoreDoc[] scoreDocs = topDocs.ScoreDocs;
int resultsCount = topDocs.TotalHits;
list.HasData = resultsCount;
StartRecPos = (PageIndex * PageSize) + 1;
if (topDocs != null)
{
for (int i = (PageIndex * PageSize); i <= (((PageIndex + 1) * PageSize)-1) && i < topDocs.ScoreDocs.Length; i++)
{
Document doc = searcher.Doc(topDocs.ScoreDocs[i].doc);
oSr = new Result();
oSr.ID = doc.Get("ID");
oSr.Title = doc.Get("Title");
oSr.Description = doc.Get("Description");
//oSr.WordCount = AllExtension.WordCount(oSr.Description, WordExist(oSr.Title, multiWordPhrase));
string preview =
oSr.Description = BBAReman.AllExtension.HighlightKeywords(oSr.Description, multiWordPhrase); //sr.Description;
oSr.Url = doc.Get("Url");
TmpEndRecpos++;
list.Result.Add(oSr);
}
}
thanks
Look up BooleanQuery
if (!string.IsNullOrEmpty(multiWordPhrase))
{
BooleanQuery bq = new BooleanQuery();
string[] fieldList = { "Title", "Description", "Url" };
List<BooleanClause.Occur> occurs = new List<BooleanClause.Occur>();
foreach (string field in fieldList)
{
occurs.Add(BooleanClause.Occur.SHOULD);
}
Query qry = MultiFieldQueryParser.Parse(Version.LUCENE_29, multiWordPhrase, fieldList, occurs.ToArray(), new StandardAnalyzer(Version.LUCENE_29));
bq.Add(qry,BooleanClause.Occur.Must);
//this is the country query (modify the Country field name to whatever you have)
string country = "UK";
Query q2 = new QueryParser(Version.LUCENE_CURRENT, "Country", analyzer).parse(country);
bq.Add(q2,BooleanClause.Occur.Must);
searcher = new IndexSearcher(_directory, false);
TopDocs topDocs = searcher.Search(bq, null, ((PageIndex + 1) * PageSize), Sort.RELEVANCE);
ScoreDoc[] scoreDocs = topDocs.ScoreDocs;
int resultsCount = topDocs.TotalHits;
list.HasData = resultsCount;
StartRecPos = (PageIndex * PageSize) + 1;
if (topDocs != null)
{
//loop through your results
}
I am trying to get boosting to work, so I can boost docs and/or fields to make the search-result as I like it to be.
However, I am unable to make boosting docs or fields have ANY effect at all on the scoring.
Either Lucene.Net boosting does not work (not very likely) or I am misunderstanding something (very likely).
Here is my stripped down to bare essentials showcase code:
using System;
using System.Collections.Generic;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
namespace SO_LuceneTest
{
class Program
{
static void Main(string[] args)
{
const string INDEXNAME = "TextIndex";
var writer = new IndexWriter(INDEXNAME, new SimpleAnalyzer(), true);
writer.DeleteAll();
var persons = new Dictionary<string, string>
{
{ "Smithers", "Jansen" },
{ "Jan", "Smith" }
};
foreach (var p in persons)
{
var doc = new Document();
var firstnameField = new Field("Firstname", p.Key, Field.Store.YES, Field.Index.ANALYZED);
var lastnameField = new Field("Lastname", p.Value, Field.Store.YES, Field.Index.ANALYZED);
//firstnameField.SetBoost(2.0f);
doc.Add(firstnameField);
doc.Add(lastnameField);
writer.AddDocument(doc);
}
writer.Commit();
writer.Close();
var term = "jan*";
var queryFields = new string[] { "Firstname", "Lastname" };
var boosts = new Dictionary<string, float>();
//boosts.Add("Firstname", 10);
QueryParser mqp = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_24, queryFields, new SimpleAnalyzer(), boosts);
var query = mqp.Parse(term);
IndexSearcher searcher = new IndexSearcher(INDEXNAME);
Hits hits = searcher.Search(query);
int results = hits.Length();
Console.WriteLine("Found {0} results", results);
for (int i = 0; i < results; i++)
{
Document doc = hits.Doc(i);
Console.WriteLine("{0} {1}\t\t{2}", doc.Get("Firstname"), doc.Get("Lastname"), hits.Score(i));
}
searcher.Close();
Console.WriteLine("...");
Console.Read();
}
}
}
I have commented out two instances of boosting. When included, the score is still the exact same as without the boosting.
What am I missing here?
I am using Lucene.Net v2.9.2.2, the latest version as of now.
please try if this will work, it does for me, but you have to modify it, because I have lots of other code which I won't be including in this post unless necessary. The main difference is use of topfieldcollector to get results
var dir = SimpleFSDirectory.Open(new DirectoryInfo(IndexPath));
var ixSearcher = new IndexSearcher(dir, false);
var qp = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, f_Text, analyzer);
query = CleanQuery(query);
Query q = qp.Parse(query);
TopFieldCollector collector = TopFieldCollector.Create(
new Sort(new SortField(null, SortField.SCORE, false), new SortField(f_Date, SortField.LONG, true)),
MAX_RESULTS,
false, // fillFields - not needed, we want score and doc only
true, // trackDocScores - need doc and score fields
true, // trackMaxScore - related to trackDocScores
false); // should docs be in docId order?
ixSearcher.Search(q, collector);
TopDocs topDocs = collector.TopDocs();
ScoreDoc[] hits = topDocs.ScoreDocs;
uint pageCount = (uint)Math.Ceiling((double)hits.Length / pageSize);
for (uint i = pageIndex * pageSize; i < (pageIndex + 1) * pageSize; i++) {
if (i >= hits.Length) {
break;
}
int doc = hits[i].Doc;
Content c = new Content {
Title = ixSearcher.Doc(doc).GetField(f_Title).StringValue(),
Text = FragmentOnOrgText(ixSearcher.Doc(doc).GetField(f_TextOrg).StringValue(), highligter.GetBestFragments(analyzer, ixSearcher.Doc(doc).GetField(f_Text).StringValue(), maxNumberOfFragments)),
Date = DateTools.StringToDate(ixSearcher.Doc(doc).GetField(f_Date).StringValue()),
Score = hits[i].Score
};
rv.Add(c);
}
ixSearcher.Close();