Lucene.Net (4.8) AutoComplete / AutoSuggestion - c#

I'd like to implement a searchable index using Lucene.Net 4.8 that supplies a user with suggestions / autocomplete for single words & phrases.
The index has been created successfully; the suggestions are where I've stalled.
Version 4.8 seems to have introduced a substantial number of breaking changes, and none of the available samples I've found work.
Where I stand
For reference, LuceneVersion is this:
private readonly LuceneVersion LuceneVersion = LuceneVersion.LUCENE_48;
Solution 1
I've tried this, but can't get past reader.Terms:
public void TryAutoComplete()
{
var analyzer = new EnglishAnalyzer(LuceneVersion);
var config = new IndexWriterConfig(LuceneVersion, analyzer);
RAMDirectory dir = new RAMDirectory();
using (IndexWriter iw = new IndexWriter(dir, config))
{
Document d = new Document();
TextField f = new TextField("text","",Field.Store.YES);
d.Add(f);
f.SetStringValue("abc");
iw.AddDocument(d);
f.SetStringValue("colorado");
iw.AddDocument(d);
f.SetStringValue("coloring book");
iw.AddDocument(d);
iw.Commit();
using (IndexReader reader = iw.GetReader(false))
{
TermEnum terms = reader.Terms(new Term("text", "co"));
int maxSuggestsCpt = 0;
// will print:
// colorado
// coloring book
do
{
Console.WriteLine(terms.Term.Text);
maxSuggestsCpt++;
if (maxSuggestsCpt >= 5)
break;
}
while (terms.Next() && terms.Term.Text.StartsWith("co"));
}
}
}
reader.Terms no longer exists. Being new to Lucene, it's unclear how to refactor this.
Solution 2
Trying this, I'm thrown an error:
public void TryAutoComplete2()
{
using(var analyzer = new EnglishAnalyzer(LuceneVersion))
{
IndexWriterConfig config = new IndexWriterConfig(LuceneVersion, analyzer);
RAMDirectory dir = new RAMDirectory();
using(var iw = new IndexWriter(dir,config))
{
Document d = new Document()
{
new TextField("text", "this is a document with a some words",Field.Store.YES),
new Int32Field("id", 42, Field.Store.YES)
};
iw.AddDocument(d);
iw.Commit();
using (IndexReader reader = iw.GetReader(false))
using (SpellChecker speller = new SpellChecker(new RAMDirectory()))
{
//ERROR HERE!!!
speller.IndexDictionary(new LuceneDictionary(reader, "text"), config, false);
string[] suggestions = speller.SuggestSimilar("dcument", 5);
IndexSearcher searcher = new IndexSearcher(reader);
foreach (string suggestion in suggestions)
{
TopDocs docs = searcher.Search(new TermQuery(new Term("text", suggestion)), null, Int32.MaxValue);
foreach (var doc in docs.ScoreDocs)
{
System.Diagnostics.Debug.WriteLine(searcher.Doc(doc.Doc).Get("id"));
}
}
}
}
}
}
When debugging, speller.IndexDictionary(new LuceneDictionary(reader, "text"), config, false); throws a The object cannot be set twice! error, which I can't explain.
Any thoughts are welcome.
Clarification
I'd like to return a list of suggested terms for a given input, not the documents or their full content.
For example, if a document contains "Hello, my name is Clark. I'm from Atlanta," and I submit "Atl," then "Atlanta" should come back as a suggestion.

If I am understanding you correctly you may be over-complicating your index design a bit. If your goal is to use Lucene for auto-complete, you want to create an index of the terms you consider complete. Then simply query the index using a PrefixQuery using a partial word or phrase.
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.En;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
using System;
using System.Linq;
namespace LuceneDemoApp
{
class LuceneAutoCompleteIndex : IDisposable
{
const LuceneVersion Version = LuceneVersion.LUCENE_48;
RAMDirectory Directory;
Analyzer Analyzer;
IndexWriterConfig WriterConfig;
private void IndexDoc(IndexWriter writer, string term)
{
Document doc = new Document();
doc.Add(new StringField(FieldName, term, Field.Store.YES));
writer.AddDocument(doc);
}
public LuceneAutoCompleteIndex(string fieldName, int maxResults)
{
FieldName = fieldName;
MaxResults = maxResults;
Directory = new RAMDirectory();
Analyzer = new EnglishAnalyzer(Version);
WriterConfig = new IndexWriterConfig(Version, Analyzer);
WriterConfig.OpenMode = OpenMode.CREATE_OR_APPEND;
}
public string FieldName { get; }
public int MaxResults { get; set; }
public void Add(string term)
{
using (var writer = new IndexWriter(Directory, WriterConfig))
{
IndexDoc(writer, term);
}
}
public void AddRange(string[] terms)
{
using (var writer = new IndexWriter(Directory, WriterConfig))
{
foreach (string term in terms)
{
IndexDoc(writer, term);
}
}
}
public string[] WhereStartsWith(string term)
{
using (var reader = DirectoryReader.Open(Directory))
{
IndexSearcher searcher = new IndexSearcher(reader);
var query = new PrefixQuery(new Term(FieldName, term));
TopDocs foundDocs = searcher.Search(query, MaxResults);
var matches = foundDocs.ScoreDocs
.Select(scoreDoc => searcher.Doc(scoreDoc.Doc).Get(FieldName))
.ToArray();
return matches;
}
}
public void Dispose()
{
Directory.Dispose();
Analyzer.Dispose();
}
}
}
Running this:
var indexValues = new string[] { "apple fruit", "appricot", "ape", "avacado", "banana", "pear" };
var index = new LuceneAutoCompleteIndex("fn", 10);
index.AddRange(indexValues);
var matches = index.WhereStartsWith("app");
foreach (var match in matches)
{
Console.WriteLine(match);
}
You get this:
apple fruit
appricot

Related

How to highlight only results of PrefixQuery in Lucene and not whole words?

I'm fairly new to Lucene and perhaps doing something really wrong, so please correct me if it is the case. Being searching for the answer for a few days now and not sure where to go from here.
The goal is to use Lucene.NET to search for user names with partial search (like StartsWith) and highlight only the found parts. For instance if I search for abc in a list of ['a', 'ab', 'abc', 'abcd', 'abcde'] it should return just the last three in a form of ['<b>abc</b>', '<b>abc</b>d', '<b>abc</b>de']
Here is how I approached this.
First the index creation:
using var indexDir = FSDirectory.Open(Path.Combine(IndexDirectory, IndexName));
using var standardAnalyzer = new StandardAnalyzer(CurrentVersion);
var indexConfig = new IndexWriterConfig(CurrentVersion, standardAnalyzer);
indexConfig.OpenMode = OpenMode.CREATE_OR_APPEND;
using var indexWriter = new IndexWriter(indexDir, indexConfig);
if (indexWriter.NumDocs == 0)
{
//fill the index with Documents
}
The documents are created like this:
static Document BuildClientDocument(int id, string surname, string name)
{
var document = new Document()
{
new StringField("Id", id.ToString(), Field.Store.YES),
new TextField("Surname", surname, Field.Store.YES),
new TextField("Surname_sort", surname.ToLower(), Field.Store.NO),
new TextField("Name", name, Field.Store.YES),
new TextField("Name_sort", name.ToLower(), Field.Store.NO),
};
return document;
}
The search is done like this:
using var multiReader = new MultiReader(indexWriter.GetReader(true)); //the plan was to use multiple indexes per entity types
var indexSearcher = new IndexSearcher(multiReader);
var queryString = "abc"; //just as a sample
var queryWords = queryString.SplitWords();
var query = new BooleanQuery();
queryWords
.Process((word, index) =>
{
var boolean = new BooleanQuery()
{
{ new PrefixQuery(new Term("Surname", word)) { Boost = 100 }, Occur.SHOULD }, //surnames are most important to match
{ new PrefixQuery(new Term("Name", word)) { Boost = 50 }, Occur.SHOULD }, //names are less important
};
boolean.Boost = (queryWords.Count() - index); //first words in a search query are more important than others
query.Add(boolean, Occur.MUST);
})
;
var topDocs = indexSearcher.Search(query, 50, new Sort( //sort by relevance and then in lexicographical order
SortField.FIELD_SCORE,
new SortField("Surname_sort", SortFieldType.STRING),
new SortField("Name_sort", SortFieldType.STRING)
));
And highlighting:
var htmlFormatter = new SimpleHTMLFormatter();
var queryScorer = new QueryScorer(query);
var highlighter = new Highlighter(htmlFormatter, queryScorer);
foreach (var found in topDocs.ScoreDocs)
{
var document = indexSearcher.Doc(found.Doc);
var surname = document.Get("Surname"); //just for simplicity
var surnameFragment = highlighter.GetBestFragment(standardAnalyzer, "Surname", surname);
Console.WriteLine(surnameFragment);
}
The problem is that the highlighter returns results like this:
<b>abc</b>
<b>abcd</b>
<b>abcde</b>
<b>abcdef</b>
So it "highlights" entire words even though I was searching for partials.
Explain returned NON-MATCH all the way so not sure if it's helpful here.
Is it possible to highlight only the parts which were searched for? Like in my example.
While searching a bit more on this I came to a conclusion that to make such highlighting work one needs to tweak index generation methods and split indices by parts so offsets would be properly calculated. Or else highlighting will highlight only surrounding words (fragments) entirely.
So based on this I've managed to build a simple highlighter of my own.
public class Highlighter
{
private const string TempStartToken = "\x02";
private const string TempEndToken = "\x03";
private const string SearchPatternTemplate = $"[{TempStartToken}{TempEndToken}]*{{0}}";
private const string ReplacePattern = $"{TempStartToken}$&{TempEndToken}";
private readonly ConcurrentDictionary<HighlightKey, Regex> _regexPatternsCache = new();
private static string GetHighlightTypeTemplate(HighlightType highlightType) =>
highlightType switch
{
HighlightType.Starts => "^{0}",
HighlightType.Contains => "{0}",
HighlightType.Ends => "{0}$",
HighlightType.Equals => "^{0}$",
_ => throw new ArgumentException($"Unsupported {nameof(HighlightType)}: '{highlightType}'", nameof(highlightType)),
};
public string Highlight(string text, IReadOnlySet<string> words, string startToken, string endToken, HighlightType highlightType)
{
foreach (var word in words)
{
var key = new HighlightKey
{
Word = word,
HighlightType = highlightType,
};
var regex = _regexPatternsCache.GetOrAdd(key, _ =>
{
var parts = word.Select(w => string.Format(SearchPatternTemplate, Regex.Escape(w.ToString())));
var pattern = string.Concat(parts);
var highlightPattern = string.Format(GetHighlightTypeTemplate(highlightType), pattern);
return new Regex(highlightPattern, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);
});
text = regex.Replace(text, ReplacePattern);
}
return text
.Replace(TempStartToken, startToken)
.Replace(TempEndToken, endToken)
;
}
private record HighlightKey
{
public string Word { get; init; }
public HighlightType HighlightType { get; init; }
}
}
public enum HighlightType
{
Starts,
Contains,
Ends,
Equals,
}
Use it like this:
var queries = new[] { "abc" }.ToHashSet();
var search = "a ab abc abcd abcde";
var highlighter = new Highlighter();
var outputs = search
.Split((string[])null, StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries)
.Select(w => highlighter.Highlight(w, queries, "<b>", "</b>", HighlightType.Starts))
;
var result = string.Join(" ", outputs).Dump();
Util.RawHtml(result).Dump();
Output looks like this:
a ab <b>abc</b> <b>abc</b>d <b>abc</b>de
a ab abc abcd abcde
I'm open to any other better solutions.

Consistent Lucene.NET runtime exception on certain queries

I'm putting together a proof of concept for Fulltext search in our application using Lucene.NET. Some queries work fine, some seem to return results that don't match what the Luke tool is returning. More problematically, this query:
(Description:tasty) (Gtin:00018389732061)
always yields this exception:
An unhandled exception of type 'System.IndexOutOfRangeException'
occurred in Lucene.Net.dll at Lucene.Net.Search.TermScorer.Score()
in d:\Lucene.Net\FullRepo\trunk\src\core\Search\TermScorer.cs:line 136
at
Lucene.Net.Search.BooleanScorer.BooleanScorerCollector.Collect(Int32
doc) in
d:\Lucene.Net\FullRepo\trunk\src\core\Search\BooleanScorer.cs:line 88
at Lucene.Net.Search.TermScorer.Score(Collector c, Int32 end, Int32
firstDocID) in
d:\Lucene.Net\FullRepo\trunk\src\core\Search\TermScorer.cs:line 80
at Lucene.Net.Search.BooleanScorer.Score(Collector collector, Int32
max, Int32 firstDocID) in
d:\Lucene.Net\FullRepo\trunk\src\core\Search\BooleanScorer.cs:line 323
at Lucene.Net.Search.BooleanScorer.Score(Collector collector) in
d:\Lucene.Net\FullRepo\trunk\src\core\Search\BooleanScorer.cs:line 389
at Lucene.Net.Search.IndexSearcher.Search(Weight weight, Filter
filter, Collector collector) in
d:\Lucene.Net\FullRepo\trunk\src\core\Search\IndexSearcher.cs:line 228
at Lucene.Net.Search.IndexSearcher.Search(Weight weight, Filter
filter, Int32 nDocs) in
d:\Lucene.Net\FullRepo\trunk\src\core\Search\IndexSearcher.cs:line 188
at Lucene.Net.Search.Searcher.Search(Query query, Filter filter, Int32
n) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\Searcher.cs:line
108 at Lucene.Net.Search.Searcher.Search(Query query, Int32 n) in
d:\Lucene.Net\FullRepo\trunk\src\core\Search\Searcher.cs:line 118
at...
If I use this query instead:
(Description:tasty) (Gtin:000)
I get results back. What is causing the exception in the top query? FWIW, here is the relevant code snippet:
protected virtual IList<Document> GetDocuments(BooleanQuery query, DirectoryInfo indexLocation, string defaultField)
{
var docs = new List<Document>();
using (var dir = new MMapDirectory(indexLocation))
{
using (var searcher = new IndexSearcher(dir))
{
var queryParser = new QueryParser(Constants.LuceneVersion, defaultField, new StandardAnalyzer(Constants.LuceneVersion));
TopDocs result = searcher.Search(query, Constants.MaxHits);
if (result == null) return docs;
foreach (var scoredoc in result.ScoreDocs.OrderByDescending(d => d.Score))
{
docs.Add(searcher.Doc(scoredoc.Doc));
}
return docs;
}
}
}
Based on comments below, here is my current un-edited code that still doesn't work.
protected virtual IList<Document> GetDocuments(BooleanQuery query, DirectoryInfo indexLocation, string defaultField)
{
var docs = new List<Document>();
using (var dir = new MMapDirectory(indexLocation))
{
using (var searcher = new IndexSearcher(dir))
{
using (var analyzer = new StandardAnalyzer(Constants.LuceneVersion))
{
var queryParser = new QueryParser(Constants.LuceneVersion, defaultField, analyzer);
var collector = TopScoreDocCollector.Create(Constants.MaxHits, true);
var parsed = queryParser.Parse(query.ToString());
searcher.Search(parsed, collector);
var docsresult = new List<string>();
var matches = collector.TopDocs().ScoreDocs;
foreach (var scoredoc in matches.OrderByDescending(d => d.Score))
{
docs.Add(searcher.Doc(scoredoc.Doc));
}
return docs;
}
}
}
}
Not strictly an answer as it "works on my machine". Posting as an answer so that I can share the unit test code that "works". Hopefully the OP can show what is different with their version.
This version assumes that the "Gtin" field is a string field and is not analyzed (as it's seems to be a code).
[TestClass]
public class UnitTest4
{
[TestMethod]
public void TestLucene()
{
var writer = CreateIndex();
Add(writer, "tasty", "00018389732061");
writer.Flush(true, true, true);
var searcher = new IndexSearcher(writer.GetReader());
Test(searcher, "(Description:tasty) (Gtin:00018389732061)");
Test(searcher, "Description:tasty Gtin:00018389732061");
Test(searcher, "+Description:tasty +Gtin:00018389732061");
Test(searcher, "+Description:tasty +Gtin:000*");
writer.Dispose();
}
private void Test(IndexSearcher searcher, string query)
{
var result = Search(searcher, query);
Console.WriteLine(string.Join(", ", result));
Assert.AreEqual(1, result.Count);
Assert.AreEqual("00018389732061", result[0]);
}
private List<string> Search(IndexSearcher searcher, string expr)
{
using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
{
var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "Description", analyzer);
var collector = TopScoreDocCollector.Create(1000, true);
var query = queryParser.Parse(expr);
searcher.Search(query, collector);
var result = new List<string>();
var matches = collector.TopDocs().ScoreDocs;
foreach (var item in matches)
{
var id = item.Doc;
var doc = searcher.Doc(id);
result.Add(doc.GetField("Gtin").StringValue);
}
return result;
}
}
IndexWriter CreateIndex()
{
var directory = new RAMDirectory();
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
var writer = new IndexWriter(directory, analyzer, new IndexWriter.MaxFieldLength(1000));
return writer;
}
void Add(IndexWriter writer, string desc, string id)
{
var document = new Document();
document.Add(new Field("Description", desc, Field.Store.YES, Field.Index.ANALYZED));
document.Add(new Field("Gtin", id, Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.AddDocument(document);
}
}

How to generate mdx query using C#?

I am new to mdx query and I am very curious about mdx query generation using C# so I searched for any demo or open source then I found Ranet.olap (https://ranetuilibraryolap.codeplex.com/) which is providing what I need.
After taking the dlls I tried to incorporate them in my code. I am pasting my full console code which should generate mdx query but it's not doing so, am I doing something wrong?
using System;
using System.Collections.Generic;
using Microsoft.AnalysisServices.AdomdClient;
using Ranet.Olap.Core.Managers;
using Ranet.Olap.Core.Metadata;
using Ranet.Olap.Core.Types;
namespace MDX
{
class Program
{
static void Main(string[] args)
{
startWork();
}
public static void startWork()
{
string connString = "Provider=MSOLAP.3; Data Source=localhost;Initial Catalog=AdventureWorkDW2008R2;Integrated Security=SSPI;";
CubeDef cubes;
AdomdConnection conn = new AdomdConnection(connString);
conn.Open();
cubes = conn.Cubes.Find("AdventureWorkCube");
Ranet.Olap.Core.Managers.MdxQueryBuilder mdx = new Ranet.Olap.Core.Managers.MdxQueryBuilder();
mdx.Cube = cubes.Caption;
List<Ranet.Olap.Core.Wrappers.AreaItemWrapper> listColumn = new List<Ranet.Olap.Core.Wrappers.AreaItemWrapper>();
List<Ranet.Olap.Core.Wrappers.AreaItemWrapper> listRow = new List<Ranet.Olap.Core.Wrappers.AreaItemWrapper>();
List<Ranet.Olap.Core.Wrappers.AreaItemWrapper> listData = new List<Ranet.Olap.Core.Wrappers.AreaItemWrapper>();
//Column area
Dimension dmColumn = cubes.Dimensions.Find("Dim Product");
Microsoft.AnalysisServices.AdomdClient.Hierarchy hColumn = dmColumn.Hierarchies["English Product Name"];
//hierarchy properties
List<PropertyInfo> lPropInfo = new List<PropertyInfo>();
foreach (var prop in hColumn.Properties)
{
PropertyInfo p = new PropertyInfo();
p.Name = prop.Name;
p.Value = prop.Value;
lPropInfo.Add(p);
}
Ranet.Olap.Core.Wrappers.AreaItemWrapper areaIColumn = new Ranet.Olap.Core.Wrappers.AreaItemWrapper();
areaIColumn.AreaItemType = AreaItemWrapperType.Hierarchy_AreaItemWrapper;
areaIColumn.Caption = hColumn.Caption;
areaIColumn.CustomProperties = lPropInfo;
listColumn.Add(areaIColumn);
//Rows Area
Dimension dmRow = cubes.Dimensions.Find("Due Date");
Microsoft.AnalysisServices.AdomdClient.Hierarchy hRow = dmRow.Hierarchies["English Month Name"];
List<PropertyInfo> lRowPropInfo = new List<PropertyInfo>();
foreach (var prop in hRow.Properties)
{
PropertyInfo p = new PropertyInfo(prop.Name,prop.Value);
lRowPropInfo.Add(p);
}
Ranet.Olap.Core.Wrappers.AreaItemWrapper areaIRow = new Ranet.Olap.Core.Wrappers.AreaItemWrapper();
areaIRow.AreaItemType = AreaItemWrapperType.Hierarchy_AreaItemWrapper;
areaIRow.Caption = hRow.Caption;
areaIRow.CustomProperties = lRowPropInfo;
listRow.Add(areaIRow);
//Measure Area or Data Area
Measure ms = cubes.Measures.Find("Order Quantity");
Ranet.Olap.Core.Wrappers.AreaItemWrapper areaIData = new Ranet.Olap.Core.Wrappers.AreaItemWrapper();
areaIData.AreaItemType = AreaItemWrapperType.Measure_AreaItemWrapper;
areaIData.Caption = ms.Caption;
List<PropertyInfo> lmpropInfo = new List<PropertyInfo>();
foreach (var prop in ms.Properties)
{
PropertyInfo p = new PropertyInfo(prop.Name, prop.Value);
lmpropInfo.Add(p);
}
areaIData.CustomProperties = lmpropInfo;
listData.Add(areaIData);
mdx.AreaWrappersColumns = listColumn;
mdx.AreaWrappersRows = listRow;
mdx.AreaWrappersData = listData;
string mdxQuery = mdx.GenerateMdxQuery();
conn.Close();
}
}
}
A simple example of the generation mdx query (only Ranet OLAP 3.7 version):
using System.Collections.Generic;
using Ranet.Olap.Core.Data;
using Ranet.Olap.Core.Managers;
using Ranet.Olap.Core.Types;
using Ranet.Olap.Core.Wrappers;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
startWork();
}
public static void startWork()
{
var mdx = new QueryBuilderParameters
{
CubeName = "[Adventure Works]",
SubCube = "",
MdxDesignerSetting = new MDXDesignerSettingWrapper(),
CalculatedMembers = new List<CalcMemberInfo>(),
CalculatedNamedSets = new List<CalculatedNamedSetInfo>(),
AreaWrappersFilter = new List<AreaItemWrapper>(),
AreaWrappersColumns = new List<AreaItemWrapper>(),
AreaWrappersRows = new List<AreaItemWrapper>(),
AreaWrappersData = new List<AreaItemWrapper>()
};
//define parameters
mdx.MdxDesignerSetting.HideEmptyColumns = false;
mdx.MdxDesignerSetting.HideEmptyRows = false;
mdx.MdxDesignerSetting.UseVisualTotals = false;
mdx.MdxDesignerSetting.SubsetCount = 0;
var itemCol1 = new Hierarchy_AreaItemWrapper
{
AreaItemType = AreaItemWrapperType.Hierarchy_AreaItemWrapper,
UniqueName = "[Customer].[Customer Geography]"
};
mdx.AreaWrappersColumns.Add(itemCol1);
var itemRow1 = new Hierarchy_AreaItemWrapper
{
AreaItemType = AreaItemWrapperType.Hierarchy_AreaItemWrapper,
UniqueName = "[Date].[Calendar]"
};
mdx.AreaWrappersRows.Add(itemRow1);
var itemData1 = new Measure_AreaItemWrapper();
itemData1.AreaItemType = AreaItemWrapperType.Measure_AreaItemWrapper;
itemData1.UniqueName = "[Measures].[Internet Order Count]";
mdx.AreaWrappersData.Add(itemData1);
string query = MdxQueryBuilder.Default.BuildQuery(mdx, null);
}
}
}
MDX Query result:
SELECT
HIERARCHIZE(HIERARCHIZE([Customer].[Customer Geography].Levels(0).Members)) DIMENSION PROPERTIES PARENT_UNIQUE_NAME, HIERARCHY_UNIQUE_NAME, CUSTOM_ROLLUP, UNARY_OPERATOR, KEY0 ON 0,
HIERARCHIZE(HIERARCHIZE([Date].[Calendar].Levels(0).Members)) DIMENSION PROPERTIES PARENT_UNIQUE_NAME, HIERARCHY_UNIQUE_NAME, CUSTOM_ROLLUP, UNARY_OPERATOR, KEY0 ON 1
FROM
[Adventure Works]
WHERE ([Measures].[Internet Order Count])
CELL PROPERTIES BACK_COLOR, CELL_ORDINAL, FORE_COLOR, FONT_NAME, FONT_SIZE, FONT_FLAGS, FORMAT_STRING, VALUE, FORMATTED_VALUE, UPDATEABLE, ACTION_TYPE
Still in process of revising code for this engine, though some suggestions for you:
It looks like you just grab cube metadata (dims, measures etc.) and pass it to generator. This does not sound like a way to generate MDX. MDX statement should look like
select
{
// measures, calculated members
} on 0,
{
// dimension data - sets
} on 1 // probably more axis
from **Cube**
All other parameters are optional

Lucene query not returning hit on standard analyzer

I have a filename thatfeelwhen.pdf that when I search for using words like "that" or "feel", I don't get a hit, when I do if I type "when" or the entire filename. I'm using a standard analyzer. How can I get the searcher for Lucene to match everything? My search queries seem to be matching on the content within the file but not in the filename.
public partial class _Default : Page
{
Directory finalDirectory = null;
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
Code below in other methods:
private static void AddTextToIndex(string filename, string pdfBody, IndexWriter writer)
{
Document doc = new Document();
doc.Add(new Field("fileName", filename.ToString(), Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("pdfBody", pdfBody.ToString(), Field.Store.NO, Field.Index.ANALYZED));
writer.AddDocument(doc);
}
private static Directory buildIndex(Analyzer analyzer)
{
string[] syllabusFiles = System.IO.Directory.GetFiles(#"C:\mywebsite\files\forms");
Directory directory = FSDirectory.Open(new DirectoryInfo(#"C:\mywebsite\files\LuceneIndex"));
var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
int j = 0;
while (j < syllabusFiles.Length)
{
string pdfTextExtracted = pdfText(syllabusFiles[j]);
string fileNameOnly = syllabusFiles[j].Replace("C:\\website\\files\\forms", "");
AddTextToIndex(fileNameOnly, pdfTextExtracted, writer);
j++;
}
writer.Optimize();
writer.Dispose();
return directory;
}
protected void txtBoxSearchPDF_Click(object sender, EventArgs e)
{
if (txtBoxSearchString.Text == "")
{
lblNoSearchString.Visible = true;
}
else if (txtBoxSearchString.Text == "build_index")
{
this.finalDirectory = buildIndex(this.analyzer);
}
else
{
//searching PDF text
lblNoSearchString.Visible = false;
StringBuilder sb = new StringBuilder();
this.finalDirectory = FSDirectory.Open(new DirectoryInfo(#"C:\mywebsite\files\LuceneIndex"));
IndexReader indexReader = IndexReader.Open(this.finalDirectory, true);
Searcher indexSearch = new IndexSearcher(indexReader);
string searchQuery = txtBoxSearchString.Text;
var fields = new[] { "fileName", "pdfBody" };
var queryParser = new MultiFieldQueryParser(Version.LUCENE_30, fields, this.analyzer);
Query query;
try
{
query = queryParser.Parse(searchQuery.Trim());
}
catch (ParseException)
{
query = queryParser.Parse(QueryParser.Escape(searchQuery.Trim()));
}
TopDocs resultDocs = indexSearch.Search(query, indexReader.MaxDoc);
var hits = resultDocs.ScoreDocs;
foreach (var hit in hits)
{
var documentFromSearcher = indexSearch.Doc(hit.Doc);
string getResult = documentFromSearcher.Get("fileName");
string formattedResult = getResult.Replace(" ", "%20");
sb.AppendLine(#"" + getResult+"");
sb.AppendLine("<br>");
}
I chose to use Analyzer analyzer = new SingleCharTokenAnalyzer(); and am getting much better results.
I tried Simple, Standard, Whitespace, and Keyword Analyzers and none were really suiting my needs without having to resort with creating extra work to customize them.

Boosting fields or documents has no effect in Lucene.Net

I am trying to get boosting to work, so I can boost docs and/or fields to make the search-result as I like it to be.
However, I am unable to make boosting docs or fields have ANY effect at all on the scoring.
Either Lucene.Net boosting does not work (not very likely) or I am misunderstanding something (very likely).
Here is my stripped down to bare essentials showcase code:
using System;
using System.Collections.Generic;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
namespace SO_LuceneTest
{
class Program
{
static void Main(string[] args)
{
const string INDEXNAME = "TextIndex";
var writer = new IndexWriter(INDEXNAME, new SimpleAnalyzer(), true);
writer.DeleteAll();
var persons = new Dictionary<string, string>
{
{ "Smithers", "Jansen" },
{ "Jan", "Smith" }
};
foreach (var p in persons)
{
var doc = new Document();
var firstnameField = new Field("Firstname", p.Key, Field.Store.YES, Field.Index.ANALYZED);
var lastnameField = new Field("Lastname", p.Value, Field.Store.YES, Field.Index.ANALYZED);
//firstnameField.SetBoost(2.0f);
doc.Add(firstnameField);
doc.Add(lastnameField);
writer.AddDocument(doc);
}
writer.Commit();
writer.Close();
var term = "jan*";
var queryFields = new string[] { "Firstname", "Lastname" };
var boosts = new Dictionary<string, float>();
//boosts.Add("Firstname", 10);
QueryParser mqp = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_24, queryFields, new SimpleAnalyzer(), boosts);
var query = mqp.Parse(term);
IndexSearcher searcher = new IndexSearcher(INDEXNAME);
Hits hits = searcher.Search(query);
int results = hits.Length();
Console.WriteLine("Found {0} results", results);
for (int i = 0; i < results; i++)
{
Document doc = hits.Doc(i);
Console.WriteLine("{0} {1}\t\t{2}", doc.Get("Firstname"), doc.Get("Lastname"), hits.Score(i));
}
searcher.Close();
Console.WriteLine("...");
Console.Read();
}
}
}
I have commented out two instances of boosting. When included, the score is still the exact same as without the boosting.
What am I missing here?
I am using Lucene.Net v2.9.2.2, the latest version as of now.
please try if this will work, it does for me, but you have to modify it, because I have lots of other code which I won't be including in this post unless necessary. The main difference is use of topfieldcollector to get results
var dir = SimpleFSDirectory.Open(new DirectoryInfo(IndexPath));
var ixSearcher = new IndexSearcher(dir, false);
var qp = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, f_Text, analyzer);
query = CleanQuery(query);
Query q = qp.Parse(query);
TopFieldCollector collector = TopFieldCollector.Create(
new Sort(new SortField(null, SortField.SCORE, false), new SortField(f_Date, SortField.LONG, true)),
MAX_RESULTS,
false, // fillFields - not needed, we want score and doc only
true, // trackDocScores - need doc and score fields
true, // trackMaxScore - related to trackDocScores
false); // should docs be in docId order?
ixSearcher.Search(q, collector);
TopDocs topDocs = collector.TopDocs();
ScoreDoc[] hits = topDocs.ScoreDocs;
uint pageCount = (uint)Math.Ceiling((double)hits.Length / pageSize);
for (uint i = pageIndex * pageSize; i < (pageIndex + 1) * pageSize; i++) {
if (i >= hits.Length) {
break;
}
int doc = hits[i].Doc;
Content c = new Content {
Title = ixSearcher.Doc(doc).GetField(f_Title).StringValue(),
Text = FragmentOnOrgText(ixSearcher.Doc(doc).GetField(f_TextOrg).StringValue(), highligter.GetBestFragments(analyzer, ixSearcher.Doc(doc).GetField(f_Text).StringValue(), maxNumberOfFragments)),
Date = DateTools.StringToDate(ixSearcher.Doc(doc).GetField(f_Date).StringValue()),
Score = hits[i].Score
};
rv.Add(c);
}
ixSearcher.Close();

Categories

Resources