Lucene search creteria change the word - c#

I use lucene for searching.
For each doc in index I have some field called "uniqueIdentifier" with type string.
When I want to find all items with "uniqueIdentifier" == "haaglanden", I use the next code:
var searcher = Examine.ExamineManager.Instance.SearchProviderCollection["RegionsSearcher"];
var searchCriteria = searcher.CreateSearchCriteria(BooleanOperation.And);
var temp = searchCriteria.RawQuery("+uniqueIdentifier:" + uniqueIdentifier);
In temp I see :
LuceneQuery: {+(+uniqueIdentifier:haagland)}
But "haagland" != "haaglanden".
And I can not find my docs.
How can I build query with "haaglanden"?

The cause was in analyzer.
Swiched Lucene.Net.Analysis.Nl.DutchAnalyzer to Lucene.Net.Analysis.Standard.StandardAnalyzer.

Related

Check if XML node value already exists in xml file using c#

Please note that I'm new to C# and I learn it right now :) I couldn't find something similar to my problem, so I came here.
I have an application in which I add customers (it's in the final stage). All customers are stored in an XML file. Every single customer gets a new customer number. In my xml file I got an XmlNode called CustNo. Now if the user add a new customer and type in a number which already exist, it should pop up a message box to say that this number already exists. I got this c# code:
XDocument xdoc = XDocument.Load(path + "\\save.xml");
var xmlNodeExist = String.Format("Buchhaltung/Customers/CustNo");
var CustNoExist = xdoc.XPathSelectElement(xmlNodeExist);
if (CustNoExist != null)
{
MessageBox.Show("asdf");
}
And my XML file looks like this:
<Buchhaltung>
<Customers>
<CustNo>12</CustNo>
<Surname>Random</Surname>
<Forename>Name</Forename>
<Addr>Address</Addr>
<Zip>12345</Zip>
<Place>New York</Place>
<Phone>1234567890</Phone>
<Mail>example#test.com</Mail>
</Customers>
<Customers>
<CustNo>13</CustNo>
<Surname>Other</Surname>
<Forename>Forename</Forename>
<Addr>My Address</Addr>
<Zip>67890</Zip>
<Place>Manhattan</Place>
<Phone>0987654321</Phone>
<Mail>test#example.com</Mail>
</Customers>
</Buchhaltung>
But then the message box always pops up. What am I doing wrong?
That's because your XPath return all CustNo elements, no matter of it's content.
Try following:
var myNumber = 12;
var xmlNodeExist = String.Format("Buchhaltung/Customers/CustNo[. = {0}]", myNumber.ToString());
or using First and LINQ to XML:
var myNumber = 12;
var xmlNodeExist = "Buchhaltung/Customers/CustNo";
var CustNoExist = xdoc.XPathSelectElements(xmlNodeExist).FirstOrDefault(x => (int)x == myNumber);
You are currently testing for existance of any 'CustNo' element. See this reference about the XPath syntax.
Your XPath should say something like this:
Buchhaltung//Customers[CustNo='12']
which would say "any customers element containing a 'CustNo' element with value = '12'"
Combining that with your current code:
var custNoGivenByCustomer = "12";
var xmlNodeExistsXpath = String.Format("Buchhaltung//Customers[CustNo='{0}']", custNoGivenByCustomer );
var CustNoExist = xdoc.XPathSelectElement(xmlNodeExistsXpath);
You can use LINQ to XML
var number = textBox1.Text;
var CustNoExist = xdoc.Descendants("CustNo").Any(x => (string)x == number);
if(CustNoExist)
{
MessageBox.Show("asdf");
}
This is because you select the CustNo elements regardless of their value. This will filter it to the desired customer number:
int custNo = 12;
var xmlNodeExist = String.Format("Buchhaltung/Customers[CustNo={0}]", custNo);
It selects the Customers elements instead, but since you're just checking for existence, that's unimportant.
W3Schools has a good tutorial/reference on XPath.

Lucene .NET searching

Hi i am trying to make autocomplete system using Lucene library to search over 170K records.
But there is a litle problem.
For example when i search for Candice Gra(...), it brings records like
Candice Jackson
Candice Hamilton
Candice Hayes
Bu not Candice Graham to make Lucene find Candice Graham i need to type Candice Graham exactly.
Here is the code that i'm building query.
Directory directory = FSDirectory.Open(new DirectoryInfo(context.Server.MapPath("
ISet<string> stopWordSet = new HashSet<string>(stopWords);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30, stopWordSet);
IndexReader indexReader = IndexReader.Open(directory, true);
Searcher indexSearch = new IndexSearcher(indexReader);
//Singe Field Search
var queryParser = new QueryParser(Version.LUCENE_30,
"Title",
analyzer);
string strQuery = string.Format("{0}", q);
var query = queryParser.Parse(strQuery);
If i build strQuery like this (* appended to the query)
string strQuery = string.Format("{0}*", q);
But using this way brings irrelevant records too.
For example if i search Candice Gra(...) again it returns records like
Grass
Gravity
Gray (etc.)
By the way i used KeywordAnalyzer and SimpleAnalyzer but these are not worked either.
Any ideas?
You should escape your spaces if you want them included in the search;
var query = queryParser.Parse(QueryParser.Escape(strQuery));
I think you need to put a AND keyword between these two words.
"Candice" AND "Gra"
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#AND

XDocument parsing

i have created a custom XDocument in c# and it looks like the following
<Filters datetimegenerated="28.07.2013 23:12PM">
<SimpleStringFilter column="xxx" table="yyy" groupby="True" seperatereport="true">
good,bad,ugly
</SimpleStringFilter>
<NumaricalFilter column="zzz" table = "mmm">zzz = 100 or zzz= 50</NumaricalFilter>
</Filters>
parsing it with in c# doesn't seem to work here is my code when i try to parse the StringFilterTags, however i get zero count from the above sample
var filters = from simplestringfilter in xdoc.Root.Element("Filters").Elements("SimpleStringFilter")
let column = simplestringfilter.Attribute("column")
let table = simplestringfilter.Attribute("table")
let groupby = simplestringfilter.Attribute("groupby")
let seperatecolumnby = simplestringfilter.Attribute("seperatereport")
let filterstringval = simplestringfilter.Value
select new
{
Column = column,
Table = table,
GroupBy = groupby,
SeperateColumnBy = seperatecolumnby,
Filterstring = filterstringval
};
what am i doing wrong?
Your query is searching off of the root element checking to see if it has a child Filters element. Since the root is the Filters element, that obviously fails which is why you are not getting any results.
There are two ways to resolve this problem. Just don't search for the Filters off of the root and your query should be fine.
var filters =
from simplestringfilter in xdoc.Root.Elements("SimpleStringFilter")
...
A better way to write it IMHO would be to not query off of the root but the document itself. It will look more natural.
var filters =
from simplestringfilter in xdoc.Element("Filters")
.Elements("SimpleStringFilter")
...

How to do regular expression search using lucene.Net

I m using lucene.Net version 3.0.3. I want to do regular expression search. I tried the following code:
// code
String SearchExpression = "[DM]ouglas";
const int hitsLimit = 1000000;
//state the file location of the index
string indexFileLocation = IndexLocation;
Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.Open(indexFileLocation);
//create an index searcher that will perform the search
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir);
var analyzer = new WhitespaceAnalyzer();
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[] {
Field_Content, }, analyzer);
Term t = new Term(Field_Content, SearchExpression);
RegexQuery scriptQuery = new RegexQuery(t);
string s = string.Format("{0}", SearchExpression);
var query = parser.Parse(s);
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.Add(query, Occur.MUST);
var hits = searcher.Search(booleanQuery, null, hitsLimit, Sort.RELEVANCE).ScoreDocs;
foreach (var hit in hits)
{
var hitDocument = searcher.Doc(hit.Doc);
string contentValue = hitDocument.Get(Field_Content);
}
// end of code
When I try to search with patten "Do*uglas", I get the results.
But if I search with the pattern "[DM]ouglas]" it is giving me the following error:
"Cannot parse '[DM]ouglas': Encountered " "]" "] "" at line 1, column 3. Was expecting one of: "TO" ... <RANGEIN_QUOTED> ... <RANGEIN_GOOP> ...".
I also tried doing simple search pattern like ".ouglas" which should give me results, as I have "Douglas" in my text content.
Does anyone know how to do regular expression search using lucene.Net version 3.0.3?
The StandardQueryParser does not support regular expressions at all. It is, instead, attempting to interpret that portion of the query as a range query.
I you wish to use regexes to search, you will need to construct a RegexQuery manually. Note, that RegexQuery performance tends to be poor. You might be able to improve it by switching from JavaUtilRegexCapabilities to JakartaRegexpCapabilities.

Why does this Lucene.Net query fail?

I am trying to convert my search functionality to allow for fuzzy searches involving multiple words. My existing search code looks like:
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Various strings denoting the document fields available
};
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
This works correctly, and if I have an entity with the name field of "My name is Andrew", and I perform a search for "Andrew Name", Lucene correctly finds the correct document. Now I want to enable fuzzy searching, so that "Anderw Name" is found correctly. I changed my method to use the following code:
const int MAX_RESULTS = 10000;
const float MIN_SIMILARITY = 0.5f;
const int PREFIX_LENGTH = 3;
if (string.IsNullOrWhiteSpace(searchString))
throw new ArgumentException("Provided search string is empty");
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Strings denoting document field names here
};
// Create a subquery where the term must match at least one of the fields
var subquery = new BooleanQuery();
foreach (string field in searchfields)
{
var queryTerm = new Term(field, term);
var fuzzyQuery = new FuzzyQuery(queryTerm, MIN_SIMILARITY, PREFIX_LENGTH);
subquery.Add(fuzzyQuery, BooleanClause.Occur.SHOULD);
}
// Add the subquery to the final query, but make at least one subquery match must be found
finalQuery.Add(subquery, BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
Unfortunately, with this code if I submit the search query "Andrew Name" (same as before) I get zero results back.
The core idea is that all terms must be found in at least one document field, but each term can reside in different fields. Does anyone have any idea why my rewritten query fails?
Final Edit: Ok it turns out I was over complicating this by a LOT, and there was no need to change from my first approach. After reverting back to the first code snippet, I enabled fuzzy searching by changing
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
to
finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);
Your code works for me if I rewrite the searchString to lower-case. I'm assuming that you're using the StandardAnalyzer when indexing, and it will generate lower-case terms.
You need to 1) pass your tokens through the same analyzer (to enable identical processing), 2) apply the same logic as the analyzer or 3) use an analyzer which matches the processing you do (WhitespaceAnalyzer).
You want this line:
var queryTerm = new Term(term);
to look like this:
var queryTerm = new Term(field, term);
Right now you're searching field term (which probably doesn't exist) for the empty string (which will never be found).

Categories

Resources