Complex Phrases and/or ComplexPhraseQueryParser in Lucene.NET

Complex Phrases and/or ComplexPhraseQueryParser in Lucene.NET - c#

I am trying to search for fairly complex queries with Lucene.Net like
"inject* needle*" OR "point* thingy"~2
So basically I need wildcards in regular as well as proximity phrases. However, the basic Lucene.Net QueryParser gets rid of these wildcards.
I understand that ComplexPhraseQueryParser would work for that, unfortunately this is not included in Lucene.Net.
Is there any way of constructing queries like this in Lucene.Net?

I ended up by actually porting the ComplexPhraseQueryParser from Java to C#. It was a lot easier than expected and was a good excercise for learning C# a bit better.
I have provided the code below in case it is helpfull to anyone else. Please note that it is still very Java-like Code as I am a lot more familiar with Java than I am with C# ;-)
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// Ported to C# from Java source at http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-misc/3.0.3/org/apache/lucene/queryParser/complexPhrase/ComplexPhraseQueryParser.java
using Lucene.Net.Analysis;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Search.Spans;
using System;
using System.Collections.Generic;
using Version = Lucene.Net.Util.Version;
public class ComplexPhraseQueryParser : QueryParser
{
private List<ComplexPhraseQuery> complexPhrases = null;
private Boolean isPass2ResolvingPhrases;
private ComplexPhraseQuery currentPhraseQuery = null;
public ComplexPhraseQueryParser(Version matchVersion, String f, Analyzer a) : base(matchVersion, f, a) { }
protected override Query GetFieldQuery(String field, String queryText, int slop)
{
ComplexPhraseQuery cpq = new ComplexPhraseQuery(field, queryText, slop);
complexPhrases.Add(cpq); // add to list of phrases to be parsed once
// we
// are through with this pass
return cpq;
}
public override Query Parse(String query)
{
if (isPass2ResolvingPhrases)
{
RewriteMethod oldMethod = MultiTermRewriteMethod;
try
{
// Temporarily force BooleanQuery rewrite so that Parser will
// generate visible
// collection of terms which we can convert into SpanQueries.
// ConstantScoreRewrite mode produces an
// opaque ConstantScoreQuery object which cannot be interrogated for
// terms in the same way a BooleanQuery can.
// QueryParser is not guaranteed threadsafe anyway so this temporary
// state change should not
// present an issue
MultiTermRewriteMethod = MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE;
return base.Parse(query);
}
finally
{
MultiTermRewriteMethod = oldMethod;
}
}
// First pass - parse the top-level query recording any PhraseQuerys
// which will need to be resolved
complexPhrases = new List<ComplexPhraseQuery>();
Query q = base.Parse(query);
// Perform second pass, using this QueryParser to parse any nested
// PhraseQueries with different
// set of syntax restrictions (i.e. all fields must be same)
isPass2ResolvingPhrases = true;
try
{
using (IEnumerator<ComplexPhraseQuery> enumerator = complexPhrases.GetEnumerator())
{
while (enumerator.MoveNext())
{
currentPhraseQuery = enumerator.Current;
currentPhraseQuery.ParsePhraseElements(this);
}
}
}
finally
{
isPass2ResolvingPhrases = false;
}
return q;
}
// There is No "getTermQuery throws ParseException" method to override so
// unfortunately need
// to throw a runtime exception here if a term for another field is embedded
// in phrase query
protected override Query NewTermQuery(Term term)
{
if (isPass2ResolvingPhrases)
{
try
{
CheckPhraseClauseIsForSameField(term.Field);
}
catch (ParseException pe)
{
throw new SystemException("Error parsing complex phrase", pe);
}
}
return base.NewTermQuery(term);
}
// Helper method used to report on any clauses that appear in query syntax
private void CheckPhraseClauseIsForSameField(String field)
{
if (!field.Equals(currentPhraseQuery.Field))
{
throw new ParseException("Cannot have clause for field \"" + field
+ "\" nested in phrase " + " for field \"" + currentPhraseQuery.Field
+ "\"");
}
}
protected override Query GetWildcardQuery(String field, String termStr)
{
if (isPass2ResolvingPhrases)
{
CheckPhraseClauseIsForSameField(field);
}
return base.GetWildcardQuery(field, termStr);
}
protected override Query GetRangeQuery(String field, String part1, String part2, Boolean inclusive)
{
if (isPass2ResolvingPhrases)
{
CheckPhraseClauseIsForSameField(field);
}
return base.GetRangeQuery(field, part1, part2, inclusive);
}
protected override Query NewRangeQuery(String field, String part1, String part2,
Boolean inclusive)
{
if (isPass2ResolvingPhrases)
{
// Must use old-style RangeQuery in order to produce a BooleanQuery
// that can be turned into SpanOr clause
TermRangeQuery rangeQuery = new TermRangeQuery(field, part1, part2, inclusive, inclusive, RangeCollator);
rangeQuery.RewriteMethod = MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE;
return rangeQuery;
}
return base.NewRangeQuery(field, part1, part2, inclusive);
}
protected Query GetFuzzyQuery(String field, String termStr, float minSimilarity)
{
if (isPass2ResolvingPhrases)
{
CheckPhraseClauseIsForSameField(field);
}
return base.GetFuzzyQuery(field, termStr, minSimilarity);
}
/*
* Used to handle the query content in between quotes and produced Span-based
* interpretations of the clauses.
*/
class ComplexPhraseQuery : Query
{
public string Field { get; set; }
public string PhrasedQueryStringContents { get; set; }
public int SlopFactor { get; set; }
private Query Contents;
public ComplexPhraseQuery(string Field, string PhrasedQueryStringContents, int SlopFactor)
: base()
{
this.Field = Field;
this.PhrasedQueryStringContents = PhrasedQueryStringContents;
this.SlopFactor = SlopFactor;
}
// Called by ComplexPhraseQueryParser for each phrase after the main
// parse
// thread is through
public void ParsePhraseElements(QueryParser qp)
{
// TODO ensure that field-sensitivity is preserved ie the query
// string below is parsed as
// field+":("+phrasedQueryStringContents+")"
// but this will need code in rewrite to unwrap the first layer of
// boolean query
Contents = qp.Parse(PhrasedQueryStringContents);
}
public override Query Rewrite(IndexReader reader)
{
// ArrayList spanClauses = new ArrayList();
if (Contents is TermQuery)
{
return Contents;
}
// Build a sequence of Span clauses arranged in a SpanNear - child
// clauses can be complex
// Booleans e.g. nots and ors etc
int numNegatives = 0;
if (!(Contents is BooleanQuery))
{
throw new ArgumentException("Unknown query type \""
+ Contents.GetType()
+ "\" found in phrase query string \"" + PhrasedQueryStringContents
+ "\"");
}
BooleanQuery bq = (BooleanQuery)Contents;
BooleanClause[] bclauses = bq.GetClauses();
SpanQuery[] allSpanClauses = new SpanQuery[bclauses.Length];
// For all clauses e.g. one* two~
for (int i = 0; i < bclauses.Length; i++)
{
// HashSet bclauseterms=new HashSet();
Query qc = bclauses[i].Query;
// Rewrite this clause e.g one* becomes (one OR onerous)
qc = qc.Rewrite(reader);
if (bclauses[i].Occur.Equals(Occur.MUST_NOT))
{
numNegatives++;
}
if (qc is BooleanQuery)
{
List<SpanQuery> sc = new List<SpanQuery>();
AddComplexPhraseClause(sc, (BooleanQuery)qc);
if (sc.Count > 0)
{
allSpanClauses[i] = sc[0];
}
else
{
// Insert fake term e.g. phrase query was for "Fred Smithe*" and
// there were no "Smithe*" terms - need to
// prevent match on just "Fred".
allSpanClauses[i] = new SpanTermQuery(new Term(Field,
"Dummy clause because no terms found - must match nothing"));
}
}
else
{
if (qc is TermQuery)
{
TermQuery tq = (TermQuery)qc;
allSpanClauses[i] = new SpanTermQuery(tq.Term);
}
else
{
throw new ArgumentException("Unknown query type \""
+ qc.GetType()
+ "\" found in phrase query string \""
+ PhrasedQueryStringContents + "\"");
}
}
}
if (numNegatives == 0)
{
// The simple case - no negative elements in phrase
return new SpanNearQuery(allSpanClauses, SlopFactor, true);
}
// Complex case - we have mixed positives and negatives in the
// sequence.
// Need to return a SpanNotQuery
List<SpanQuery> positiveClauses = new List<SpanQuery>();
for (int j = 0; j < allSpanClauses.Length; j++)
{
if (!bclauses[j].Occur.Equals(Occur.MUST_NOT))
{
positiveClauses.Add(allSpanClauses[j]);
}
}
//SpanQuery[] includeClauses = positiveClauses.ToArray(new SpanQuery[positiveClauses.Count]);
SpanQuery[] includeClauses = positiveClauses.ToArray();
SpanQuery include = null;
if (includeClauses.Length == 1)
{
include = includeClauses[0]; // only one positive clause
}
else
{
// need to increase slop factor based on gaps introduced by
// negatives
include = new SpanNearQuery(includeClauses, SlopFactor + numNegatives,
true);
}
// Use sequence of positive and negative values as the exclude.
SpanNearQuery exclude = new SpanNearQuery(allSpanClauses, SlopFactor,
true);
SpanNotQuery snot = new SpanNotQuery(include, exclude);
return snot;
}
private void AddComplexPhraseClause(List<SpanQuery> spanClauses, BooleanQuery qc)
{
List<SpanQuery> ors = new List<SpanQuery>();
List<SpanQuery> nots = new List<SpanQuery>();
BooleanClause[] bclauses = qc.GetClauses();
// For all clauses e.g. one* two~
for (int i = 0; i < bclauses.Length; i++)
{
Query childQuery = bclauses[i].Query;
// select the list to which we will add these options
List<SpanQuery> chosenList = ors;
if (bclauses[i].Occur == Occur.MUST_NOT)
{
chosenList = nots;
}
if (childQuery is TermQuery)
{
TermQuery tq = (TermQuery)childQuery;
SpanTermQuery stq = new SpanTermQuery(tq.Term);
stq.Boost = tq.Boost;
chosenList.Add(stq);
}
else if (childQuery is BooleanQuery)
{
BooleanQuery cbq = (BooleanQuery)childQuery;
AddComplexPhraseClause(chosenList, cbq);
}
else
{
// TODO alternatively could call extract terms here?
throw new ArgumentException("Unknown query type:"
+ childQuery.GetType());
}
}
if (ors.Count == 0)
{
return;
}
SpanOrQuery soq = new SpanOrQuery(ors.ToArray());
if (nots.Count == 0)
{
spanClauses.Add(soq);
}
else
{
SpanOrQuery snqs = new SpanOrQuery(nots.ToArray());
SpanNotQuery snq = new SpanNotQuery(soq, snqs);
spanClauses.Add(snq);
}
}
public override String ToString(String field)
{
return "\"" + PhrasedQueryStringContents + "\"";
}
public override int GetHashCode()
{
const int prime = 31;
int result = 1;
result = prime * result + ((Field == null) ? 0 : Field.GetHashCode());
result = prime
* result
+ ((PhrasedQueryStringContents == null) ? 0
: PhrasedQueryStringContents.GetHashCode());
result = prime * result + SlopFactor;
return result;
}
public override Boolean Equals(Object obj)
{
if (this == obj)
return true;
if (obj == null)
return false;
if (GetType() != obj.GetType())
return false;
ComplexPhraseQuery other = (ComplexPhraseQuery)obj;
if (Field == null)
{
if (other.Field != null)
return false;
}
else if (!Field.Equals(other.Field))
return false;
if (PhrasedQueryStringContents == null)
{
if (other.PhrasedQueryStringContents != null)
return false;
}
else if (!PhrasedQueryStringContents
.Equals(other.PhrasedQueryStringContents))
return false;
if (SlopFactor != other.SlopFactor)
return false;
return true;
}
}
}

Related

OutOfMemoryException when updating a large list?

I have a large list and I would like to overwrite one value if required. To do this, I create two subsets of the list which seems to give me an OutOfMemoryException. Here is my code snippet:
if (ownRG != "")
{
List<string> maclist = ownRG.Split(',').ToList();
List<IVFile> temp = powlist.Where(a => maclist.Contains(a.Machine)).ToList();
powlist = powlist.Where(a => !maclist.Contains(a.Machine)).ToList(); // OOME Here
temp.ForEach(a => { a.ReportingGroup = ownRG; });
powlist.AddRange(temp);
}
Essentially I'm splitting the list into the part that needs updating and the part that doesn't, then I perform the update and put the list back together. This works fine for smaller lists, but breaks with an OutOfMemoryException on the third row within the if for a large list. Can I make this more efficient?
NOTE
powlist is the large list (>1m) items. maclist only has between 1 and 10 but even with 1 item this breaks.

Solving your issue
Here is how to rearrange your code using the enumerator code from my answer:
if (!string.IsNullOrEmpty(ownRG))
{
var maclist = new CommaSeparatedStringEnumerable(str);
var temp = powlist.Where(a => maclist.Contains(a.Machine));
foreach (var p in temp)
{
p.ReportingGroup = ownRG;
}
}
You should not use ToList in your code.
You don't need to remove thee contents of temp from powlist (you are re-adding them anyway)
Streaming over a large comma-separated string
You can iterate over the list manually instead of doing what you do now, by looking for , characters and remembering the position of the last found one and the one before. This will definitely make your app work because then it won't need to store the entire set in the memory at once.
Code example:
var str = "aaa,bbb,ccc";
var previousComma = -1;
var currentComma = 0;
for (; (currentComma = str.IndexOf(',', previousComma + 1)) != -1; previousComma = currentComma)
{
var currentItem = str.Substring(previousComma + 1, currentComma - previousComma - 1);
Console.WriteLine(currentItem);
}
var lastItem = str.Substring(previousComma + 1);
Console.WriteLine(lastItem);
Custom iterator
If you want to do it 'properly' in a fancy way, you can even write a custom enumerator:
public class CommaSeparatedStringEnumerator : IEnumerator<string>
{
int previousComma = -1;
int currentComma = -1;
string bigString = null;
bool atEnd = false;
public CommaSeparatedStringEnumerator(string s)
{
if (s == null)
throw new ArgumentNullException("s");
bigString = s;
this.Reset();
}
public string Current { get; private set; }
public void Dispose() { /* No need to do anything here */ }
object IEnumerator.Current { get { return this.Current; } }
public bool MoveNext()
{
if (atEnd)
return false;
atEnd = (currentComma = bigString.IndexOf(',', previousComma + 1)) == -1;
if (!atEnd)
Current = bigString.Substring(previousComma + 1, currentComma - previousComma - 1);
else
Current = bigString.Substring(previousComma + 1);
previousComma = currentComma;
return true;
}
public void Reset()
{
previousComma = -1;
currentComma = -1;
atEnd = false;
this.Current = null;
}
}
public class CommaSeparatedStringEnumerable : IEnumerable<string>
{
string bigString = null;
public CommaSeparatedStringEnumerable(string s)
{
if (s == null)
throw new ArgumentNullException("s");
bigString = s;
}
public IEnumerator<string> GetEnumerator()
{
return new CommaSeparatedStringEnumerator(bigString);
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
Then you can iterate over it like this:
var str = "aaa,bbb,ccc";
var enumerable = new CommaSeparatedStringEnumerable(str);
foreach (var item in enumerable)
{
Console.WriteLine(item);
}
Other thoughts
Can I make this more efficient?
Yes, you can. I suggest to either work with a more efficient data format (you can take a look around databases or XML, JSON, etc. depending on your needs). If you really want to work with comma-separated items, see my code examples above.

There's no need to create a bunch of sub-lists from powlist and reconstruct it. Simply loop over the powlist and update the ReportingGroup property accordingly.
var maclist = new HashSet<string>( ownRG.Split(',') );
foreach( var item in powlist) {
if( maclist.Contains( item.Machine ) ){
item.ReportingGroup = ownRG;
}
}
Since this changes powlist in place, you won't allocate any extra memory and shouldn't run into an OutOfMemoryException.

In a loop find the next ',' char. Take the substring between the ',' and the previous ',' position. At the end of the loop save a reference to the previous ',' position (which is initially set to 0). So you parse the items one-by-one rather than all at once.

You can try looping the items of your lists, but this will increase processing time.
foreach(var item in powlist)
{
//do your opeartions
}

Code folding in RichTextBox

I am working on a Code Editor derived from Winforms RichTextBox using C#. I have already implemented autocompletion and syntax hilighting, but code folding is somewhat a different approach. What I want to achieve is:
The code below:
public static SomeFunction(EventArgs e)
{
//Some code
//Some code
//Some code
//Some code
//Some code
//Some code
}
Should become:
public static SomeFunction(EventArgs e)[...]
Where[...] is a shortened code that is displayed in a tooltip when you hover over at [...]
Any ideas or suggestions how to do it, either using Regex or procedural code?

I have created a parser that will return the indices of code folding locations.
Folding delimiters are defined by regular expressions.
You can specify a start and ending index so that you don't have to check the entire code when one area is updated.
It will throw exceptions if the code is not properly formatted, feel free to change that behavior. One alternative could be that it keeps moving up the stack until an appropriate end token is found.
Fold Finder
public class FoldFinder
{
public static FoldFinder Instance { get; private set; }
static FoldFinder()
{
Instance = new FoldFinder();
}
public List<SectionPosition> Find(string code, List<SectionDelimiter> delimiters, int start = 0,
int end = -1)
{
List<SectionPosition> positions = new List<SectionPosition>();
Stack<SectionStackItem> stack = new Stack<SectionStackItem>();
int regexGroupIndex;
bool isStartToken;
SectionDelimiter matchedDelimiter;
SectionStackItem currentItem;
Regex scanner = RegexifyDelimiters(delimiters);
foreach (Match match in scanner.Matches(code, start))
{
// the pattern for every group is that 0 corresponds to SectionDelimter, 1 corresponds to Start
// and 2, corresponds to End.
regexGroupIndex =
match.Groups.Cast<Group>().Select((g, i) => new {
Success = g.Success,
Index = i
})
.Where(r => r.Success && r.Index > 0).First().Index;
matchedDelimiter = delimiters[(regexGroupIndex - 1) / 3];
isStartToken = match.Groups[regexGroupIndex + 1].Success;
if (isStartToken)
{
stack.Push(new SectionStackItem()
{
Delimter = matchedDelimiter,
Position = new SectionPosition() { Start = match.Index }
});
}
else
{
currentItem = stack.Pop();
if (currentItem.Delimter == matchedDelimiter)
{
currentItem.Position.End = match.Index + match.Length;
positions.Add(currentItem.Position);
// if searching for an end, and we've passed it, and the stack is empty then quit.
if (end > -1 && currentItem.Position.End >= end && stack.Count == 0) break;
}
else
{
throw new Exception(string.Format("Invalid Ending Token at {0}", match.Index));
}
}
}
if (stack.Count > 0) throw new Exception("Not enough closing symbols.");
return positions;
}
public Regex RegexifyDelimiters(List<SectionDelimiter> delimiters)
{
return new Regex(
string.Join("|", delimiters.Select(d =>
string.Format("(({0})|({1}))", d.Start, d.End))));
}
}
public class SectionStackItem
{
public SectionPosition Position;
public SectionDelimiter Delimter;
}
public class SectionPosition
{
public int Start;
public int End;
}
public class SectionDelimiter
{
public string Start;
public string End;
}
Sample Find
The sample below matches folds delimited by {,}, [,], and right after a symbol until a ;. I don't see too many IDE's that fold for each line, but it might be handy at long pieces of code, like a LINQ query.
var sectionPositions =
FoldFinder.Instance.Find("abc { def { qrt; ghi [ abc ] } qrt }", new List<SectionDelimiter>(
new SectionDelimiter[3] {
new SectionDelimiter() { Start = "\\{", End = "\\}" },
new SectionDelimiter() { Start = "\\[", End = "\\]" },
new SectionDelimiter() { Start = "(?<=\\[|\\{|;|^)[^[{;]*(?=;)", End = ";" },
}));

dynamic flexibility in C#

I have recently started learning programming and chose .NET with Visual Studio Express. I am trying to write a CSV Parser as a learning experience and it's giving me a lot more trouble than I expected. I am starting with the reader. One thing I am doing differently in my parser is that I am not using quotes. I am escaping commas with a backslash, backslashes with a backslash, and line breaks with a backslash. For example, if a comma is preceded by an even number of backslashes it is a field and I halve any blocks of backslashes. If it's odd, it's not end of field and I still halve blocks of backslashes. I'm not sure how robust this will be if I can ever get it working, except I'm only learning at this point and I'm looking at it mostly as an exercise in manipulating data structures.
I have a question in reference to the code snippet at the bottom of this post and how to make it not so static and limiting and still compile and run for me.
The line of code that reads:
var contents = (String)fileContents;
I keep trying to make it more dynamic to increase flexibility and make it something like this:
var contents = (otherVariableThatCouldChangeTypeAtRuntime.GetType())fileContents;
Is there something I can do to get it to do this and still compile? Maybe something like Option Infer from VB.NET might help, except I can't find that.
Also, I have written this in VB.NET as well. It seems to me that VB.NET allows me a considerably more dynamic style than what I've posted below, such as not having to type var over and over again and not having to keep casting my index counting variable into an integer over and over again if I shut off Option Strict and Option Explicit as well as turn on Option Infer. For example, C# won't let me type something analogous to the following VB.NET code even though I know the methods and properties I will be calling at run-time will be there at run-time.
Dim contents As Object = returnObjectICantDetermineAtComplieTime()
contents.MethodIKnowWillBeThereAtRunTime()
Can I do these things in C#? Anyways, here's the code and thanks in advance for any responses.
public class Widget
{
public object ID { get; set; }
public object PartNumber { get; set; }
public object VendorID { get; set; }
public object TypeID { get; set; }
public object KeyMarkLoc { get; set; }
public Widget() { }
}
public object ReadFromFile(object source)
{
var fileContents = new FileService().GetFileContents(source);
object records = null;
if (fileContents == null)
return null;
var stringBuffer = "";
var contents = (String)fileContents;
while (contents.Length > 0 && contents != "\r\n")
{
for (object i = 0; (int)i < contents.Length; i=(int)i+1 )
{
object character = contents[(int)i];
if (!stringBuffer.EndsWith("\r\n"))
{
stringBuffer += character.ToString();
}
if (stringBuffer.EndsWith("\r\n"))
{
var bSlashes = getBackSlashes(stringBuffer.Substring(0, stringBuffer.Length - 4));
stringBuffer = stringBuffer.Substring(0, stringBuffer.Length - 4);
if ((int)bSlashes % 2 == 0)
{
break;
}
}
}
contents = contents.Substring(stringBuffer.Length+2);
records = records == null ? getIncrementedList(new List<object>(), getNextObject(getFields(stringBuffer))) : getIncrementedList((List<object>)records, getNextObject(getFields(stringBuffer)));
}
return records;
}
private Widget getNextRecord(object[] fields)
{
var personStudent = new Widget();
personStudent.ID = fields[0];
personStudent.PartNumber = fields[1];
personStudent.VendorID = fields[2];
personStudent.TypeID = fields[3];
personStudent.GridPath = fields[4];
return personStudent;
}
private object[] getFields(object buffer)
{
var fields = new object[5];
var intFieldCount = 0;
var fieldVal = "";
var blocks = buffer.ToString().Split(',');
foreach (var block in blocks)
{
var bSlashes = getBackSlashes(block);
var intRemoveCount = (int)bSlashes / 2;
if ((int)bSlashes % 2 == 0) // Delimiter
{
fieldVal += block.Substring(0, block.Length - intRemoveCount);
fields[intFieldCount] += fieldVal;
intFieldCount++;
fieldVal = "";
}
else // Part of Field
{
fieldVal += block.Substring(0, block.Length - intRemoveCount - 1) + ",";
}
}
return fields;
}
private object getBackSlashes(object block)
{
object bSlashes = block.ToString().Length == 0 ? new int?(0) : null;
for (object i = block.ToString().Length - 1; (int)i>-1; i=(int)i-1)
{
if (block.ToString()[(int)i] != '\\') return bSlashes = bSlashes == null ? 0 : bSlashes;
bSlashes = bSlashes == null ? 1 : (int)bSlashes + 1;
}
return bSlashes;
}
}
Here is the web service code.
[WebMethod]
public object GetFileContents(object source)
{
return File.ReadAllText(source.ToString());
}

Dim contents As Object = returnObjectICantDetermineAtComplieTime()
contents.MethodIKnowWillBeThereAtRunTime()
You can do this with the dynamic type.
See for more information: http://msdn.microsoft.com/en-us/library/dd264736.aspx

Logical Inversion of Symbol Tree

I have a class, Symbol_Group, that represents an invertible expression of the nature AB(C+DE) + FG. Symbol_Group contains a List<List<iSymbol>>, where iSymbol is an interface applied to Symbol_Group, and Symbol.
The above equation would be represented as A,B,Sym_Grp + F,G; Sym_Grp = C + D,E, where each + represents a new List<iSymbol>
I need to be able to invert and expand this equation using an algorithm that can handle any amount of nesting, and any amount of symbols anded or ored together, to produce a set of Symbol_Group, with each containing a unique expansion. For the above question the answer set would be !A!F; !B!F; !C!D!F; !C!E!F; !A!G; !B!G; !C!D!G; !C!E!G;
I know that I will need to use recursion, but I have had very little experience with it. Any help figuring out this algorithm would be appreciated.

Unless you are somehow required to use a List<List<iSymbol>>, I recommend switching to a different class structure, with a base class (or interface) Expression and subclasses (or implementors) SymbolExpression, NotExpression, OrExpression, and AndExpression. A SymbolExpression contains a single symbol; a NotExpression contains one Expression, and OrExpression and AndExpression contain two expressions each. This is a much more standard structure for working with mathematical expressions, and it is probably simpler to perform the transformations on it.
With the above classes, you can model any expression as a binary tree. Negate the expression by replacing the root by a NotExpression whose child is the original root. Then, traverse the tree with a depth-first search, and whenever you hit a NotExpression whose child is an OrExpression or an AndExpression, you can replace that by an AndExpression or an OrExpression (respectively) whose children are NotExpressions with the original children below them. You might also want to eliminate double negations (look for NotExpressions whose child is a NotExpression, and remove both).
(Whether this answer is understandable probably depends on how comfortable you are with working with trees. Let me know if you need clarification.)

After much work, this is the method I used to get the minimum terms of inversion.
public List<iSymbol> GetInvertedGroup()
{
TrimSymbolList();
List<List<iSymbol>> symbols = this.CopyListMembers(Symbols);
List<iSymbol> SymList;
while (symbols.Count > 1)
{
symbols.Add(MultiplyLists(symbols[0], symbols[1]));
symbols.RemoveRange(0, 2);
}
SymList = symbols[0];
for(int i=0;i<symbols[0].Count;i++)
{
if (SymList[i] is Symbol)
{
Symbol sym = SymList[i] as Symbol;
SymList.RemoveAt(i--);
Symbol_Group symgrp = new Symbol_Group(null);
symgrp.AddSymbol(sym);
SymList.Add(symgrp);
}
}
for (int i = 0; i < SymList.Count; i++)
{
if (SymList[i] is Symbol_Group)
{
Symbol_Group SymGrp = SymList[i] as Symbol_Group;
if (SymGrp.Symbols.Count > 1)
{
List<iSymbol> list = SymGrp.GetInvertedGroup();
SymList.RemoveAt(i--);
AddElementsOf(list, SymList);
}
}
}
return SymList;
}
public List<iSymbol> MultiplyLists(List<iSymbol> L1, List<iSymbol> L2)
{
List<iSymbol> Combined = new List<iSymbol>(L1.Count + L2.Count);
foreach (iSymbol S1 in L1)
{
foreach (iSymbol S2 in L2)
{
Symbol_Group newGrp = new Symbol_Group(null);
newGrp.AddSymbol(S1);
newGrp.AddSymbol(S2);
Combined.Add(newGrp);
}
}
return Combined;
}
This resulted in a List of Groups of Symbols, with each group representing 1 or term in the final result (e.g !A!F). Some further code was used to reduce this to a List>, as there was a reasonable amount of nesting in the answer. To reduce it, I used:
public List<List<Symbol>> ReduceList(List<iSymbol> List)
{
List<List<Symbol>> Output = new List<List<Symbol>>(List.Count);
foreach (iSymbol iSym in List)
{
if (iSym is Symbol_Group)
{
List<Symbol> L = new List<Symbol>();
(iSym as Symbol_Group).GetAllSymbols(L);
Output.Add(L);
}
else
{
throw (new Exception());
}
}
return Output;
}
public void GetAllSymbols(List<Symbol> List)
{
foreach (List<iSymbol> SubList in Symbols)
{
foreach (iSymbol iSym in SubList)
{
if (iSym is Symbol)
{
List.Add(iSym as Symbol);
}
else if (iSym is Symbol_Group)
{
(iSym as Symbol_Group).GetAllSymbols(List);
}
else
{
throw(new Exception());
}
}
}
}
Hope this helps someone else!

I came to this simpler solution after a bit of rejigging. I hope it helps out somebody else with a similar problem! This is the class structure (plus a few other properties)
public class SymbolGroup : iSymbol
{
public SymbolGroup(SymbolGroup Parent, SymRelation Relation)
{
Symbols = new List<iSymbol>();
this.Parent = Parent;
SymbolRelation = Relation;
if (SymbolRelation == SymRelation.AND)
Name = "AND Group";
else
Name = "OR Group";
}
public int Depth
{
get
{
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
return (s as SymbolGroup).Depth + 1;
}
}
return 1;
}
}
}
The method of inversion is also contained within this class. It replaces an unexpanded group in the results list with all of the expanded results of that result. It only strips away one level at a time.
public List<SymbolGroup> InvertGroup()
{
List<SymbolGroup> Results = new List<SymbolGroup>();
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
SymbolGroup sg = s as SymbolGroup;
sg.Parent = null;
Results.Add(s as SymbolGroup);
}
else if (s is Symbol)
{
SymbolGroup sg = new SymbolGroup(null, SymRelation.AND);
sg.AddSymbol(s);
Results.Add(sg);
}
}
bool AllChecked = false;
while (!AllChecked)
{
AllChecked = true;
for(int i=0;i<Results.Count;i++)
{
SymbolGroup result = Results[i];
if (result.Depth > 1)
{
AllChecked = false;
Results.RemoveAt(i--);
}
else
continue;
if (result.SymbolRelation == SymRelation.OR)
{
Results.AddRange(result.MultiplyOut());
continue;
}
for(int j=0;j<result.nSymbols;j++)
{
iSymbol s = result.Symbols[j];
if (s is SymbolGroup)
{
result.Symbols.RemoveAt(j--); //removes the symbolgroup that is being replaced, so that the rest of the group can be added to the expansion.
AllChecked = false;
SymbolGroup subResult = s as SymbolGroup;
if(subResult.SymbolRelation == SymRelation.OR)
{
List<SymbolGroup> newResults;
newResults = subResult.MultiplyOut();
foreach(SymbolGroup newSg in newResults)
{
newSg.Symbols.AddRange(result.Symbols);
}
Results.AddRange(newResults);
}
break;
}
}
}
}
return Results;
}

Is there an equivalent to the Scanner class in C# for strings?

In Java I can pass a Scanner a string and then I can do handy things like, scanner.hasNext() or scanner.nextInt(), scanner.nextDouble() etc.
This allows some pretty clean code for parsing a string that contains rows of numbers.
How is this done in C# land?
If you had a string that say had:
"0 0 1 22 39 0 0 1 2 33 33"
In Java I would pass that to a scanner and do a
while(scanner.hasNext())
myArray[i++] = scanner.nextInt();
Or something very similar. What is the C#' ish way to do this?

I'm going to add this as a separate answer because it's quite distinct from the answer I already gave. Here's how you could start creating your own Scanner class:
class Scanner : System.IO.StringReader
{
string currentWord;
public Scanner(string source) : base(source)
{
readNextWord();
}
private void readNextWord()
{
System.Text.StringBuilder sb = new StringBuilder();
char nextChar;
int next;
do
{
next = this.Read();
if (next < 0)
break;
nextChar = (char)next;
if (char.IsWhiteSpace(nextChar))
break;
sb.Append(nextChar);
} while (true);
while((this.Peek() >= 0) && (char.IsWhiteSpace((char)this.Peek())))
this.Read();
if (sb.Length > 0)
currentWord = sb.ToString();
else
currentWord = null;
}
public bool hasNextInt()
{
if (currentWord == null)
return false;
int dummy;
return int.TryParse(currentWord, out dummy);
}
public int nextInt()
{
try
{
return int.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNextDouble()
{
if (currentWord == null)
return false;
double dummy;
return double.TryParse(currentWord, out dummy);
}
public double nextDouble()
{
try
{
return double.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNext()
{
return currentWord != null;
}
}

Using part of the answers already given, I've created a StringReader that can extract Enum and any data type that implements IConvertible.
Usage
using(var reader = new PacketReader("1 23 ErrorOk StringValue 15.22")
{
var index = reader.ReadNext<int>();
var count = reader.ReadNext<int>();
var result = reader.ReadNext<ErrorEnum>();
var data = reader.ReadNext<string>();
var responseTime = reader.ReadNext<double>();
}
Implementation
public class PacketReader : StringReader
{
public PacketReader(string s)
: base(s)
{
}
public T ReadNext<T>() where T : IConvertible
{
var sb = new StringBuilder();
do
{
var current = Read();
if (current < 0)
break;
sb.Append((char)current);
var next = (char)Peek();
if (char.IsWhiteSpace(next))
break;
} while (true);
var value = sb.ToString();
var type = typeof(T);
if (type.IsEnum)
return (T)Enum.Parse(type, value);
return (T)((IConvertible)value).ToType(typeof(T), System.Globalization.CultureInfo.CurrentCulture);
}
}

While this isn't the exact same fundamental concept, what you're looking for can be done with this lambda expression:
string foo = "0 0 1 22 39 0 0 1 2 33 33";
int[] data = foo.Split(' ').Select(p => int.Parse(p)).ToArray();
What this does is first Split the string, using a space as a delimiter. The Select function then allows you to specify an alias for a given member in the array (which I referred to as 'p' in this example), then perform an operation on that member to give a final result. The ToArray() call then turns this abstract enumerable class into a concrete array.
So in this end, this splits the string, then converts each element into an int and populates an int[] with the resulting values.

To my knowledge, there are no built in classes in the framework for doing this. You would have to roll your own.
That would not be too hard. A nice C# version might implement IEnumerable so you could say:
var scanner = new Scanner<int>(yourString);
foreach(int n in scanner)
; // your code

To get as close as possible to your syntax, this'll work if you're only interested in one type ("int" in the example):
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
IEnumerator<int> scanner = (from arg in args select int.Parse(arg)).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current);
}
}
Here's an even more whiz-bang version that allows you to access any type that is supported by string's IConvertible implementation:
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
var scanner = args.Select<string, Func<Type, Object>>((string s) => {
return (Type t) =>
((IConvertible)s).ToType(t, System.Globalization.CultureInfo.InvariantCulture);
}).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current(typeof(int)));
}
}
Just pass a different type to the "typeof" operator in the while loop to choose the type.
These both require the latest versions of C# and the .NET framework.

You could use linq to accomplish this like so:
string text = "0 0 1 22 39 0 0 1 2 33 33";
text.Where(i => char.IsNumber(i)).Write(); // do somthing usefull here...

I would do this in one of a couple ways depending on whether 1) you are using the latest .NET framework with LINQ support and 2) you know the values are valid integers. Here's a function to demonstrate both:
int[] ParseIntArray(string input, bool validateRequired)
{
if (validateRequired)
{
string[] split = input.Split();
List<int> result = new List<int>(split.Length);
int parsed;
for (int inputIdx = 0; inputIdx < split.Length; inputIdx++)
{
if (int.TryParse(split[inputIdx], out parsed))
result.Add(parsed);
}
return result.ToArray();
}
else
return (from i in input.Split()
select int.Parse(i)).ToArray();
}
Based on comments in other answer(s), I assume you need the validation. After reading those comments, I think the closest thing you'll get is int.TryParse and double.TryParse, which is kind of a combination of hasNextInt and nextInt (or a combination of hasNextDouble and nextDouble).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Complex Phrases and/or ComplexPhraseQueryParser in Lucene.NET - c#

Related

OutOfMemoryException when updating a large list?

Code folding in RichTextBox

dynamic flexibility in C#

Logical Inversion of Symbol Tree

Is there an equivalent to the Scanner class in C# for strings?

Categories

Resources