Lucene 4.8 facets usage

Lucene 4.8 facets usage - c#

I have difficulties understanding this example on how to use facets :
https://lucenenet.apache.org/docs/4.8.0-beta00008/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html
My goal is to create an index in which each document field have a facet, so that at search time i can choose which facets use to navigate data.
What i am confused about is setup of facets in index creation, to
summarize my question : is index with facets compatibile with
ReferenceManager?
Need DirectoryTaxonomyWriter to be actually written and persisted
on disk or it will embedded into the index itself and is just
temporary? I mean given the code
indexWriter.AddDocument(config.Build(taxoWriter, doc)); of the
example i expect it's temporary and will be embedded into the index (but then the example also show you need the Taxonomy to drill down facet). So can the Taxonomy be tangled in some way with the index so that the are handled althogeter with ReferenceManager?
If is not may i just use the same folder i use for storing index?
Here is a more detailed list of point that confuse me :
In my scenario i am indexing the document asyncrhonously (background process) and then fetching the indext ASAP throught ReferenceManager in ASP.NET application. I hope this way to fetch the index is compatibile with DirectoryTaxonomyWriter needed by facets.
Then i modified the code i write introducing the taxonomy writer as indicated in the example, but i am a bit confused, seems like i can't store DirectoryTaxonomyWriter into the same folder of index because the folder is locked, need i to persist it or it will be embedded into the index (so a RAMDirectory is enougth)? if i need to persist it in a different direcotry, can i safely persist it into subdirectory?
Here the code i am actually using :
private static void BuildIndex (IndexEntry entry)
{
string targetFolder = ConfigurationManager.AppSettings["IndexFolder"] ?? string.Empty;
//** LOG
if (System.IO.Directory.Exists(targetFolder) == false)
{
string message = #"Index folder not found";
_fileLogger.Error(message);
_consoleLogger.Error(message);
return;
}
var metadata = JsonConvert.DeserializeObject<IndexMetadata>(File.ReadAllText(entry.MetdataPath) ?? "{}");
string[] header = new string[0];
List<dynamic> csvRecords = new List<dynamic>();
using (var reader = new StreamReader(entry.DataPath))
{
CsvConfiguration csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture);
csvConfiguration.AllowComments = false;
csvConfiguration.CountBytes = false;
csvConfiguration.Delimiter = ",";
csvConfiguration.DetectColumnCountChanges = false;
csvConfiguration.Encoding = Encoding.UTF8;
csvConfiguration.HasHeaderRecord = true;
csvConfiguration.IgnoreBlankLines = true;
csvConfiguration.HeaderValidated = null;
csvConfiguration.MissingFieldFound = null;
csvConfiguration.TrimOptions = CsvHelper.Configuration.TrimOptions.None;
csvConfiguration.BadDataFound = null;
using (var csvReader = new CsvReader(reader, csvConfiguration))
{
csvReader.Read();
csvReader.ReadHeader();
csvReader.Read();
header = csvReader.HeaderRecord;
csvRecords = csvReader.GetRecords<dynamic>().ToList();
}
}
string targetDirectory = Path.Combine(targetFolder, "Index__" + metadata.Boundle + "__" + DateTime.Now.ToString("yyyyMMdd_HHmmss") + "__" + Path.GetRandomFileName().Substring(0, 6));
System.IO.Directory.CreateDirectory(targetDirectory);
//** LOG
{
string message = #"..creating index : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
using (var dir = FSDirectory.Open(targetDirectory))
{
using (DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(dir))
{
Analyzer analyzer = metadata.GetAnalyzer();
var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
using (IndexWriter writer = new IndexWriter(dir, indexConfig))
{
long entryNumber = csvRecords.Count();
long index = 0;
long lastPercentage = 0;
foreach (dynamic csvEntry in csvRecords)
{
Document doc = new Document();
IDictionary<string, object> dynamicCsvEntry = (IDictionary<string, object>)csvEntry;
var indexedMetadataFiled = metadata.IdexedFields;
foreach (string headField in header)
{
if (indexedMetadataFiled.ContainsKey(headField) == false || (indexedMetadataFiled[headField].NeedToBeIndexed == false && indexedMetadataFiled[headField].NeedToBeStored == false))
continue;
var field = new Field(headField,
((string)dynamicCsvEntry[headField] ?? string.Empty).ToLower(),
indexedMetadataFiled[headField].NeedToBeStored ? Field.Store.YES : Field.Store.NO,
indexedMetadataFiled[headField].NeedToBeIndexed ? Field.Index.ANALYZED : Field.Index.NO
);
doc.Add(field);
var facetField = new FacetField(headField, (string)dynamicCsvEntry[headField]);
doc.Add(facetField);
}
long percentage = (long)(((decimal)index / (decimal)entryNumber) * 100m);
if (percentage > lastPercentage && percentage % 10 == 0)
{
_consoleLogger.Information($"..indexing {percentage}%..");
lastPercentage = percentage;
}
writer.AddDocument(doc);
index++;
}
writer.Commit();
}
}
}
//** LOG
{
string message = #"Index Created : {0}";
_fileLogger.Information(message, targetDirectory);
_consoleLogger.Information(message, targetDirectory);
}
}

Related

Problem implementing Lucene.Net 4.8 partial and fuzzy search in multilanguage environment

As stated in the title i am trying to implement a search on document that are indexed with many Analyzers.
Documents are always lower case (so no issue about cases) but i have issue about multiple analyzers and partial search.
Example of issues :
If i search for 'grap' it will match with 'group' (and it's ok i use fuzzy search) but does not match with 'graphical'.
Another issue is that if the "hint" with the document to index tell the indexer to use a specific language (and so not the Standard.Analyzer in the indexing phase) the query then is unable to fetch data.
I have read that this is due the fact that also the query should be set to use the right analyzer, but i haven't understood how do achieve that.
Here my code :
public List<JObject> Search(string boundle, string query, LuceneHint luceneHint, int pageIndex, int itemsPerPage, string key)
{
//TODO : security check
if (string.IsNullOrEmpty(query))
throw new ArgumentNullException("query");
if (query.Length < 3)
throw new ArgumentException("query parameter too short, should be at least 3 characters long.");
if ((luceneHint?.FieldsToSearch?.Any() ?? false) == false)
throw new ArgumentNullException("luceneHint");
if (pageIndex < 0)
pageIndex = 0;
if (itemsPerPage < 1)
itemsPerPage = int.MaxValue;
if ((luceneHint?.Top ?? 0) < itemsPerPage)
itemsPerPage = luceneHint.Top;
var tokens = Regex.Split(query.Trim(), #"\W+");
//Here some test i made with query
//Query composedQuery = new MatchAllDocsQuery();
/*BooleanQuery composedQuery = new BooleanQuery();
foreach (var field in luceneHint.FieldsToSearch)
{
PhraseQuery phraseQuery = new PhraseQuery();
foreach (string word in tokens)
{
phraseQuery.Add(new Term(field.FieldName, word));
}
phraseQuery.Boost = (float)field.Weight;
composedQuery.Add(phraseQuery, Occur.SHOULD);
}*/
BooleanQuery composedQuery = new BooleanQuery();
foreach (var field in luceneHint.FieldsToSearch)
{
foreach (string word in tokens)
{
if (string.IsNullOrWhiteSpace(word))
continue;
var termQuery = new FuzzyQuery(new Term(field.FieldName, word.ToLower()));
termQuery.Boost = (float)field.Weight;
composedQuery.Add(termQuery, Occur.SHOULD);
}
}
var indexManager = IndexManager.Instance;
ReferenceManager<IndexSearcher> index = indexManager.Read(boundle); //index manager is an utility that will bound a boundle to a folder and so to an index
int resultLimit = luceneHint?.Top ?? RESULT_LIMIT;
var results = new List<JObject>();
var searcher = index.Acquire();
try
{
Dictionary<string, FieldDescriptor> filedToRead = (luceneHint?.FieldsToRead?.Any() ?? false) ?
luceneHint.FieldsToRead.ToDictionary(item => item.FieldName, item => item) :
new Dictionary<string, FieldDescriptor>();
bool fetchEveryField = filedToRead.Count == 0;
TopScoreDocCollector collector = TopScoreDocCollector.Create(resultLimit, true);
int startPageIndex = pageIndex * itemsPerPage;
searcher.Search(composedQuery, collector);
//TopDocs topDocs = searcher.Search(composedQuery, luceneHint?.Top ?? 100);
TopDocs topDocs = collector.GetTopDocs(startPageIndex, itemsPerPage);
foreach (var scoreDoc in topDocs.ScoreDocs)
{
Document doc = searcher.Doc(scoreDoc.Doc);
dynamic result = new JObject();
foreach (var field in doc.Fields)
if (fetchEveryField || filedToRead.ContainsKey(field.Name))
result[field.Name] = field.GetStringValue();
results.Add(result);
}
}
finally
{
if ( searcher != null )
index.Release(searcher);
}
return results;
}
The read boudle part basically cache and get the result of this code :
foreach ( var boundle in boundles)
result.Add(new BoundleEntry() { Bounde = boundle.Key, Index = new SearcherManager(FSDirectory.Open(boundle.Value.OrderByDescending(folder => folder).FirstOrDefault()), new SearcherFactory()) });
So basically i pass a method a string referring to a given boundle, then a cache is queryied to see if i can get the rigth SearchManager (calculated as above). If the cache is empty or the boundle is missing then the method above will scan for missing boundle and load (into the cache) the corresponding Search manager.
In the code above the variable "boundle" is not a string but a data structure (boudnle key is the string used to query the cache) that keep track of folder-version/boundleKey association.
Improtant : i notieced now that i misspelled a word in my source code when you see the word "boundle" what i mean was "bundle"

File not being released by IIS when processing

In an ASP.Net MVC4 application, I'm using the following code to process a Go To Webinar Attendees report (CSV format).
For some reason, the file that is being loaded is not being released by IIS and it is causing issues when attempting to process another file.
Do you see anything out of the ordinary here?
The CSVHelper (CsvReader) is from https://joshclose.github.io/CsvHelper/
public AttendeesData GetRecords(string filename, string webinarKey)
{
StreamReader sr = new StreamReader(Server.MapPath(filename));
CsvReader csvread = new CsvReader(sr);
csvread.Configuration.HasHeaderRecord = false;
List<AttendeeRecord> record = csvread.GetRecords<AttendeeRecord>().ToList();
record.RemoveRange(0, 7);
AttendeesData attdata = new AttendeesData();
attdata.Attendees = new List<Attendee>();
foreach (var rec in record)
{
Attendee aa = new Attendee();
aa.Webinarkey = webinarKey;
aa.FullName = String.Concat(rec.First_Name, " ", rec.Last_Name);
aa.AttendedWebinar = 0;
aa.Email = rec.Email_Address;
aa.JoinTime = rec.Join_Time.Replace(" CST", "");
aa.LeaveTime = rec.Leave_Time.Replace(" CST", "");
aa.TimeInSession = rec.Time_in_Session.Replace("hour", "hr").Replace("minute", "min");
aa.Makeup = 0;
aa.RegistrantKey = Registrants.Where(x => x.email == rec.Email_Address).FirstOrDefault().registrantKey;
List<string> firstPolls = new List<string>()
{
rec.Poll_1.Trim(), rec.Poll_2.Trim(),rec.Poll_3.Trim(),rec.Poll_4.Trim()
};
int pass1 = firstPolls.Count(x => x != "");
List<string> secondPolls = new List<string>()
{
rec.Poll_5.Trim(), rec.Poll_6.Trim(),rec.Poll_7.Trim(),rec.Poll_8.Trim()
};
int pass2 = secondPolls.Count(x => x != "");
aa.FirstPollCount = pass1;
aa.SecondPollCount = pass2;
if (aa.TimeInSession != "")
{
aa.AttendedWebinar = 1;
}
if (aa.FirstPollCount == 0 || aa.SecondPollCount == 0)
{
aa.AttendedWebinar = 0;
}
attdata.Attendees.Add(aa);
attendeeToDB(aa); // adds to Oracle DB using EF6.
}
// Should I call csvread.Dispose() here?
sr.Close();
return attdata;
}

Yes. You have to dispose objects too.
sr.Close();
csvread.Dispose();
sr.Dispose();
Better strategy to use using keyword.

You should use usings for your streamreaders and writers.
You should follow some naming conventions (Lists contains always multiple entries, rename record to records)
You should use clear names (not aa)

Check if a location is indexed in Windows Search

How to check if a location is indexed or not? I found following code to index a location in Windows which works fine but I want to check if it is indexed or not before I make it indexed.
Uri path = new Uri(location);
string indexingPath = path.AbsoluteUri;
CSearchManager csm = new CSearchManager();
CSearchCrawlScopeManager manager = csm.GetCatalog("SystemIndex").GetCrawlScopeManager();
manager.AddUserScopeRule(indexingPath, 1, 1, 0);
manager.SaveAll();
Guys i have found a way to check if the location has been included for indexing by using IncludedInCrawlScope.
CSearchManager csm = new CSearchManager();
CSearchCrawlScopeManager manager = csm.GetCatalog("SystemIndex").GetCrawlScopeManager();
if (manager.IncludedInCrawlScope(indexingPath) == 0)
{
manager.AddUserScopeRule(indexingPath, 1, 1, 0);
manager.SaveAll();
}
But it only checks if it has been added for indexing, not if the indexing is complete.Since i will be querying on the SystemIndex, i need to make sure that the location is indexed.

I ran into a similar need and this is what I came up with. In my case I have certain file extensions that are going to end up being sent to a document management system.
I have two methods one uses the System.IO to get a list of the files in the directory that contain the extension from the list.
public IEnumerable<string> DirectoryScan(string directory)
{
List<string> extensions = new List<string>
{
"docx","xlsx","pptx","docm","xlsm","pptm","dotx","xltx","xlw","potx","ppsx","ppsm","doc","xls","ppt","doct","xlt","xlm","pot","pps"
};
IEnumerable<string> myFiles =
Directory.GetFiles(directory, "*", SearchOption.AllDirectories)
.Where(s => extensions.Any(s.EndsWith))
.ToList();
return myFiles;
}`
The second method uses the windows index search Microsoft.Search.Interop
public IEnumerable<string> QueryWindowsDesktopSearch(string directory)
{
List<string> extensions = new List<string>
{ "docx","xlsx","pptx","docm","xlsm","pptm","dotx","xltx","xlw","potx","ppsx","ppsm","doc","xls","ppt","doct","xlt","xlm","pot","pps"};
string userQuery = "*";
Boolean fShowQuery = true;
List<string> list = new List<string>();
CSearchManager manager = new CSearchManager();
CSearchCatalogManager catalogManager = manager.GetCatalog("SystemIndex");
CSearchQueryHelper queryHelper = catalogManager.GetQueryHelper();
queryHelper.QueryWhereRestrictions = string.Format("AND (\"SCOPE\" = 'file:{0}')", directory);
if (extensions != null)
{
queryHelper.QueryWhereRestrictions += " AND Contains(System.ItemType,'";
bool fFirst = true;
foreach (string ext in extensions)
{
if (!fFirst)
{
queryHelper.QueryWhereRestrictions += " OR ";
}
queryHelper.QueryWhereRestrictions += "\"" + ext + "\"";
fFirst = false;
}
queryHelper.QueryWhereRestrictions += "') ";
}
string sqlQuery = queryHelper.GenerateSQLFromUserQuery(userQuery);
using (OleDbConnection connection = new OleDbConnection(queryHelper.ConnectionString))
{
using (OleDbCommand command = new OleDbCommand(sqlQuery, connection))
{
connection.Open();
OleDbDataReader dataReader = command.ExecuteReader();
while (dataReader.Read())
{
var file = dataReader.GetString(0);
if (file != null)
{
list.Add(file.Replace("file:", ""));
}
}
}
}
return list;
}
I call both of these methods from another methods that takes the two results and compares them and returns a Boolean value indicating if they two list match. If they do not match then the folder has not been indexed fully.
If you call the QueryWindowsDesktopSearch on a folder that has not been indexed it returns zero files. You could use this as an indication that the folder isn't in the index bt its possible that the file has been added to the index but the file indexing is stopped.
You could check the status by calling something like this
CSearchManager manager = new CSearchManager();
CSearchCatalogManager catalogManager = manager.GetCatalog("SystemIndex");
_CatalogPausedReason pReason;
_CatalogStatus pStatus;
catalogManager.GetCatalogStatus(out pStatus, out pReason);
That may return something like pStatus = CATALOG_STATUS_PAUSED and pReason = CATALOG_PAUSED_REASON_USER_ACTIVE
You would know that the index is not running. Another thing you could do is call the following
int incrementalCount, notificationQueue, highPriorityQueue;
catalogManager.NumberOfItemsToIndex(out incrementalCount, out notificationQueue, out highPriorityQueue);
This is going to return the in plIncrementalCount value which would list the number of file that the entire SystemIndex has queued for indexing.

Check this implementation from a document management system:
https://code.google.com/p/olakedms/source/browse/SearchEngine/CSearchDAL.cs?r=171

Upload XLSX to XDocument changes the number format

I have the following method to take an XLSX file and convert it to an XDocument:
public static XDocument ConvertXlsx2Xml(string fileName, string sheetName)
{
// Return the value of the specified cell.
const string documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
const string worksheetSchema = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
const string sharedStringsRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings";
const string sharedStringSchema = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
string cellValue = null;
var xsurvey = new XDocument(new XDeclaration("1.0", "UTF-8", "yes"));
var xroot = new XElement("Root"); //Create the root
using (Package xlPackage = Package.Open(fileName, FileMode.Open, FileAccess.Read))
{
PackagePart documentPart = null;
Uri documentUri = null;
// Get the main document part (workbook.xml).
foreach (System.IO.Packaging.PackageRelationship relationship in xlPackage.GetRelationshipsByType(documentRelationshipType))
{
// There should only be one document part in the package.
documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), relationship.TargetUri);
documentPart = xlPackage.GetPart(documentUri);
// There should only be one instance, but get out no matter what.
break;
}
if (documentPart != null)
{
// Load the contents of the workbook.
var doc = new XmlDocument();
doc.Load(documentPart.GetStream());
/*
doc now contains the following important nodes:
<bookViews>
<workbookView xWindow="-15615" yWindow="2535" windowWidth="26835" windowHeight="13095" activeTab="2" />
<sheets>
<sheet name="Sheet1" sheetId="2" r:id="rId1" />
*/
// Create a namespace manager, so you can search.
// Add a prefix (d) for the default namespace.
var nt = new NameTable();
var nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("d", worksheetSchema);
nsManager.AddNamespace("s", sharedStringSchema);
//If value for sheetName isn't found, take the first sheet
string searchString = string.Format("//d:sheet[#name='{0}']", sheetName);
XmlNode sheetNode = doc.SelectSingleNode(searchString, nsManager) ??
doc.SelectSingleNode("//d:sheet", nsManager);
/*
* 11/15/12 DS Added to avoid pulling the data each time
* Create a dictionary of the shared strings from the associated string file
*/
#region Shared String Dictionary
var sharedStrings = new Dictionary<int, string>();
foreach (System.IO.Packaging.PackageRelationship stringRelationship in documentPart.GetRelationshipsByType(sharedStringsRelationshipType))
{
// There should only be one shared string reference, so you exit this loop immediately.
Uri sharedStringsUri = PackUriHelper.ResolvePartUri(documentUri, stringRelationship.TargetUri);
PackagePart stringPart = xlPackage.GetPart(sharedStringsUri);
{
// Load the contents of the shared strings.
var stringDoc = new XmlDocument(nt);
stringDoc.Load(stringPart.GetStream());
nsManager.AddNamespace("s", sharedStringSchema);
const string strSearch = "//s:sst";
XmlNode stringNode = stringDoc.SelectSingleNode(strSearch, nsManager);
int keyInt = 0;
if (stringNode != null)
foreach (XmlElement nd in stringNode)
{
//string test = nd.InnerText;
sharedStrings.Add(keyInt, nd.InnerText);
keyInt = keyInt + 1;
}
}
}
#endregion
var hrowList = new List<string>();
var hrowArray = new string[] {};
if (sheetNode != null && sheetNode.Attributes != null)
{
// Get the relId attribute:
XmlAttribute relationAttribute = sheetNode.Attributes["r:id"];
if (relationAttribute != null)
{
string relId = relationAttribute.Value;
// First, get the relation between the document and the sheet.
PackageRelationship sheetRelation = documentPart.GetRelationship(relId);
Uri sheetUri = PackUriHelper.ResolvePartUri(documentUri, sheetRelation.TargetUri);
PackagePart sheetPart = xlPackage.GetPart(sheetUri);
// Load the contents of the workbook.
var sheetDoc = new XmlDocument(nt);
sheetDoc.Load(sheetPart.GetStream());
/*
* sheetDoc now contains the following important nodes:
* <dimension ref="A1:V81" /> range of sheet data
* <sheetData>
* <row r="1" spans="1:22"> <row> r = row number, spans = columns containing the data
* <c r="A1" t="s"><v>0</v></c> <c> r = Cell address (A1,B4,etc), t = data type ("s"=string,"b"=bool, null=decimal)
* <v> contents are the index num if t="s", or value of t=null
*/
XmlNode sheetDataNode = sheetDoc.SelectSingleNode("//d:sheetData", nsManager);
int roNum = 0;
if (sheetDataNode != null)
{
var isSkip = false;
foreach (XmlElement row in sheetDataNode)
{
var xrow = new XElement("Row");
foreach (XmlElement cell in row)
{
XmlAttribute typeAttr = cell.Attributes["t"];
string cellType = typeAttr != null ? typeAttr.Value : string.Empty;
XmlNode valueNode = cell.SelectSingleNode("d:v", nsManager);
cellValue = valueNode != null ? valueNode.InnerText : cellValue;
// Check the cell type. At this point, this code only checks for booleans and strings individually.
switch (cellType)
{
case "b":
cellValue = cellValue == "1" ? "TRUE" : "FALSE";
break;
case "s":
cellValue = sharedStrings[Convert.ToInt32(cellValue)];
break;
}
if (cellValue == null) continue;
cellValue = cellValue.Replace("\r", "");
cellValue = cellValue.Replace("\n", " ");
cellValue = cellValue.Trim();
if (roNum == 0)
{
hrowList.Add(cellValue);
}
else
{
//XmlAttribute rowAttr = cell.Attributes["r"];
//int intStart = rowAttr.Value.IndexOfAny("0123456789".ToCharArray());
//colLet = rowAttr.Value.Substring(0, intStart);
//int colNum = NumberFromExcelColumn(colLet);
int colNum = GetColNum(cell);
/* 05/29/13 DS force column names to UPPER to remove case sensitivity */
var xvar = new XElement(hrowArray[colNum - 1].ToUpper());
xvar.SetValue(cellValue);
xrow.Add(xvar);
}
/* 6/18/2013 DS You must clear the cellValue so it is carried into the next cell value if it is empty. */
cellValue = "";
}
if (roNum == 0) hrowArray = hrowList.ToArray();
else xroot.Add(xrow);
roNum = roNum + 1;
}
}
}
}
}
}
xsurvey.Add(xroot);
return xsurvey;
}
For the most part, it works very well. However, I have just noticed that if one of the cell values contains a number like 0.004 it becomes 4.0000000000000001E-3.
The resulting XML gets imported and that value is loaded as a string, but before the final transfer to the production tables, this particular field is converted to numeric. That format doesn't work with numeric.
How do I prevent that change on load? If I can't, is there a better method to prevent a system error, other than specifically scrubbing that field and changing it back as part of the transfer process?
UPDATE
It is only numbers less than .01 that have a problem. 1, 1.004, and .04 are fine, but .004 is not.
UPDATE 2
If I format the cells as text BEFORE populating the data, I do not have this issue. There is something about how ManEx stores the data that prevents a clean upload.

Read value as double with invariant culture, and then convert it to decimal
var cellValueDouble = Convert.ToDouble(cellValue,System.Globalization.CultureInfo.InvariantCulture);
var cellValueDecimal = Convert.ToDecimal(cellValueDouble);

Not getting the value of the hit field?

I am doing a project on search using Lucene.Net. We have created an index which contains 100 000 documents with 5 fields. But while searching I'm unable to track my correct record. Can anybody help me? Why is that so?
My code looks like this
List<int> ids = new List<int>();
List<Hits> hitList = new List<Hits>();
List<Document> results = new List<Document>();
int startPage = (pageIndex.Value - 1) * pageSize.Value;
string indexFileLocation = #"c:\\ResourceIndex\\"; //Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData), "ResourceIndex");
var fsDirectory = FSDirectory.Open(new DirectoryInfo(indexFileLocation));
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexReader indexReader = IndexReader.Open(fsDirectory, true);
Searcher indexSearch = new IndexSearcher(indexReader);
//ids.AddRange(this.SearchPredicates(indexSearch, startPage, pageSize, query));
/*Searching From the ResourceIndex*/
Query resourceQuery = MultiFieldQueryParser.Parse(Lucene.Net.Util.Version.LUCENE_29,
new string[] { productId.ToString(), languagelds, query },
new string[] { "productId", "resourceLanguageIds", "externalIdentifier" },
analyzer);
TermQuery descriptionQuery = new TermQuery(new Term("description", '"'+query+'"'));
//TermQuery identifierQuery = new TermQuery(new Term("externalIdentifier", query));
BooleanQuery filterQuery = new BooleanQuery();
filterQuery.Add(descriptionQuery, BooleanClause.Occur.MUST);
//filterQuery.Add(identifierQuery,BooleanClause.Occur.MUST_NOT);
Filter filter = new CachingWrapperFilter(new QueryWrapperFilter(filterQuery));
TopScoreDocCollector collector = TopScoreDocCollector.create(100, true);
//Hits resourceHit = indexSearch.Search(resourceQuery, filter);
indexSearch.Search(resourceQuery, filter, collector);
ScoreDoc[] hits = collector.TopDocs().scoreDocs;
//for (int i = startPage; i <= pageSize && i < resourceHit.Length(); i++)
//{
// ids.Add(Convert.ToInt32(resourceHit.Doc(i).GetField("id")));
//}
for (int i = 0; i < hits.Length; i++)
{
int docId = hits[i].doc;
float score = hits[i].score;
Lucene.Net.Documents.Document doc = indexSearch.Doc(docId);
string result = "Score: " + score.ToString() +
" Field: " + doc.Get("id");
}

You're calling Document.Get("id"), which returns the value of a stored field. It wont work without Field.Store.YES when indexing.
You could use the FieldCache if you've got the field indexed without analyzing (Field.Index.NOT_ANALYZED) or using the KeywordAnalyzer. (Meaning one term per field and document.)
You'll need to use the innermost reader for the FieldCache to work optimally. Here's a code paste from FieldCache with frequently updated index which uses the FieldCache in a proper way, reading an integer value from the id field.
// Demo values, use existing code somewhere here.
var directory = FSDirectory.Open(new DirectoryInfo("index"));
var reader = IndexReader.Open(directory, readOnly: true);
var documentId = 1337;
// Grab all subreaders.
var subReaders = new List<IndexReader>();
ReaderUtil.GatherSubReaders(subReaders, reader);
// Loop through all subreaders. While subReaderId is higher than the
// maximum document id in the subreader, go to next.
var subReaderId = documentId;
var subReader = subReaders.First(sub => {
if (sub.MaxDoc() < subReaderId) {
subReaderId -= sub.MaxDoc();
return false;
}
return true;
});
var values = FieldCache_Fields.DEFAULT.GetInts(subReader, "id");
var value = values[subReaderId];

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Lucene 4.8 facets usage - c#

Related

Problem implementing Lucene.Net 4.8 partial and fuzzy search in multilanguage environment

File not being released by IIS when processing

Check if a location is indexed in Windows Search

Upload XLSX to XDocument changes the number format

Not getting the value of the hit field?

Categories

Resources