How to make common prefixes for regex word stemming?

How to make common prefixes for regex word stemming? - c#

I have an array of words I need to do a find-and-replace by regex operation on, and sometimes this array can be thousands of words long. I've tested and found that stemming the words using common prefixes is much faster than searching for them individually. That is, ^where|why$ is slower than ^wh(ere|y)$. Obviously it's not a noticeable difference in such a short example, but it's considerably faster where there are thousands of alternatives and the subject string is long.
So I'm looking for a way to do this stemming automatically, for instance to convert a string[] { "what", "why", "where", "when", "which" } into wh(at|y|e(re|n)|i(ch))
Is there already a recognized algorithm out there that does this ? If not, how would you go about it ? It seems to need to be done recursively but I can't quite get my head round how to do it. I have a method I wrote that works to a limited extent, but it's inelegant, 60 lines longs and uses multiple nested foreach loops so it's a future maintenance nightmare. I'm sure there's a much better way, if anyone could point me in the right direction that'd be much appreciated...

This code should work:
public static class StemmingUtilities
{
private class Node
{
public char? Value { get; private set; }
public Node Parent { get; private set; }
public List<Node> Children { get; private set; }
public Node(char? c, Node parent)
{
this.Value = c;
this.Parent = parent;
this.Children = new List<Node>();
}
}
public static string GetRegex(IEnumerable<string> tokens)
{
var root = new Node(null,null);
foreach (var token in tokens)
{
var current = root;
for (int i = 0; i < token.Length; i++)
{
char c = token[i];
var node = current.Children.FirstOrDefault(x => x.Value.Value == c);
if (node == null)
{
node = new Node(c,current);
current.Children.Add(node);
}
current = node;
}
}
return BuildRexp(root);
}
private static string BuildRexp(Node root)
{
string s = "";
bool addBracket = root.Children.Count > 1;
// uncomment the following line to avoid first brakets wrapping (occurring in case of multiple root's children)
// addBracket = addBracket && (root.Parent != null);
if (addBracket)
s += "(";
for(int i = 0; i < root.Children.Count; i++)
{
var child = root.Children[i];
s += child.Value;
s += BuildRexp(child);
if (i < root.Children.Count - 1)
s += "|";
}
if (addBracket)
s += ")";
return s;
}
}
Usage:
var toStem1 = new[] { "what", "why", "where", "when", "which" };
string reg1 = StemmingUtilities.GetRegex(toStem1);
// reg1 = "wh(at|y|e(re|n)|ich)"
string[] toStem2 = new[] { "why", "abc", "what", "where", "apple", "when" };
string reg2 = StemmingUtilities.GetRegex(toStem2);
// reg2 = "(wh(y|at|e(re|n))|a(bc|pple))"
EDIT:
to get reg2 = "wh(y|at|e(re|n))|a(bc|pple)" i.e. without the first wrapping brackets, just uncomment the marked line in BuildRexp method.

Related

C# examining and replacing tuple values based on other tuple

I'm starting with programming and C# and I have two tuples. One tuple is representing a list of points:
static List<(string, string, string)> PR { get; set; } = new List<(string, string, string)>()
{
("P1", "0", "0"),
("P2", "P1", "P1+Height"),
("P3", "P1+Width", "P2"),
("P4", "P3", "P3+Height")
};
where Item1 in the list of tuples stands for a Point name (P1, P2, P3, P4) and Item2 and Item3 represent a parametric formula for respectively the x- and y-value of a point.
"P1" in the second item in the above list should look for the tuple starting with "P1", and then for the second item in that tuple, in this case, 0.
I have a second list of tuples that represent the parameters that I need to calculate the above point values.
static List<(string, double)> PAR { get; set; } = new List<(string, double)>()
{
("Height", 500),
("Width", 1000)
};
Say I want to calculate the value of the parametric formula "P3+Height" as follows:
P3+Height --> P2 (+Height) --> P1+Height (+Height) --> 0 (+Height (+Height) --> 0 + Height + Height;
In the end I want to replace the parameter strings with the actual values (0 + 500 + 500 -> P3+Height = 1000) but thats of later concern.
Question: I'm trying to make a function that recursively evaluates the list of tuples and keeps the parameter names, but also looks for the corresponding tuple until we reach an end or exit situation. This is where I'm at now but I have a hard time getting my thought process in actual working code:
static void Main(string[] args)
{
//inputString = "P3+Height"
string inputString = PR[3].Item3;
string[] returnedString = GetParameterString(inputString);
#region READLINE
Console.ReadLine();
#endregion
}
private static string[] GetParameterString(string inputString)
{
string[] stringToEvaluate = SplitInputString(inputString);
for (int i = 0; i < stringToEvaluate.Length; i++)
{
//--EXIT CONDITION
if (stringToEvaluate[0] == "P1")
{
stringToEvaluate[i] = "0";
}
else
{
if (i % 2 == 0)
{
//Check if parameters[i] is point string
var value = PAR.Find(p => p.Item1.Equals(stringToEvaluate[i]));
//Check if parameters[i] is double string
if (double.TryParse(stringToEvaluate[i], out double result))
{
stringToEvaluate[i] = result.ToString();
}
else if (value == default)
{
//We have a point identifier
var relatingPR = PR.Find(val => val.Item1.Equals(stringToEvaluate[i])).Item2;
//stringToEvaluate[i] = PR.Find(val => val.Item1.Equals(pointId)).Item2;
stringToEvaluate = SearchParameterString(relatingPR);
}
else
{
//We have a parameter identifier
stringToEvaluate[i] = value.Item2.ToString();
}
}
}
}
return stringToEvaluate;
}
private static string[] SplitInputString(string inputString)
{
string[] splittedString;
splittedString = Regex.Split(inputString, Delimiters);
return splittedString;
}
Can anyone point me in the right direction of how this could be done with either recursion or some other, better, easier way?
In the end, I need to get a list of tuples like this:
("P1", "0", "0"),
("P2", "0", "500"),
("P3", "1000", "500"),
("P4", "1000", "1000")
Thanks in advance!

I wrote something that does this - I changed a bit of the structure to simplify the code and runtime, but it still returns the tuple you expect:
// first I used dictionaries so we can search for the corresponding value efftiantly:
static Dictionary<string, (string width, string height)> PR { get; set; } =
new Dictionary<string, (string width, string height)>()
{
{ "P1", ("0", "0") },
{ "P2", ("P1", "P1+Height")},
{ "P3", ("P1+Width", "P2") },
{ "P4", ("P3", "P3+Height") }
};
static Dictionary<string, double> PAR { get; set; } = new Dictionary<string, double>()
{
{ "Height", 500 },
{ "Width", 1000 }
};
static void Main(string[] args)
{
// we want to "translate" each of the values height and width values
List<(string, string, string)> res = new List<(string, string, string)>();
foreach (var curr in PR)
{
// To keep the code generic we want the same code to run for height and width-
// but for functionality reasons we need to know which it is - so sent it as a parameter.
res.Add((curr.Key,
GetParameterVal(curr.Value.width, false).ToString(),
GetParameterVal(curr.Value.height, true).ToString()));
}
#region READLINE
Console.ReadLine();
#endregion
}
private static double GetParameterVal(string inputString, bool isHeight)
{
// for now the only delimiter is + - adapt this and the code when \ as needed
// this will split string with the delimiter ("+height", "-500", etc..)
string[] delimiters = new string[] { "\\+", "\\-" };
string[] stringToEvaluate =
Regex.Split(inputString, string.Format("(?=[{0}])", string.Join("", delimiters)));
// now we want to sum up each "part" of the string
var sum = stringToEvaluate.Select(x =>
{
double result;
int factor = 1;
// this will split from the delimiter, we will use it as a factor,
// ["+", "height"], ["-", "500"] etc..
string[] splitFromDelimiter=
Regex.Split(x, string.Format("(?<=[{0}])", string.Join("|", delimiters)));
if (splitFromDelimiter.Length > 1) {
factor = int.Parse(string.Format($"{splitFromDelimiter[0]}1"));
x = splitFromDelimiter[1];
}
if (PR.ContainsKey(x))
{
// if we need to continue recursively translate
result = GetParameterVal(isHeight ? PR[x].height : PR[x].width, isHeight);
}
else if (PAR.ContainsKey(x))
{
// exit condition
result = PAR[x];
}
else
{
// just in case we didnt find something - we should get here
result = 0;
}
return factor * result;
}).Sum();
return sum;
}
}
}
I didnt add any validity checks, and if a value wasn't found it recieves a val of 0, but go ahead and adapt it to your needs..

Here a a working example for your question... It took me a lot of time so I hope you appreciate it: The whole code is comented line by line. If you have any question do not hesitate to ask me !
First of all we create a class named myEntry that will represent an entry. The name has to be unique e.g P1, P2, P3
public class myEntry
{
public string Name { get; private set; } //this field should be unique
public object Height { get; set; } //Can contain a reference to another entry or a value also as many combinations of those as you want.
// They have to be separated with a +
public object Width { get; set; } //same as for height here
public myEntry(string name, object height, object width)
{
//Set values
this.Name = name;
this.Height = height;
this.Width = width;
}
}
Now I create a dummy Exception class for an exception in a further class (you will see the use of this further on. Just ignore it for now)
public class UnknownEntry : Exception
{
//Create a new Class that represents an exception
}
Now we create the important class that will handle the entries and do all the work for us. This might look complicated but if you don't want to spend time understanding it you can just copy paste it, its a working solution!
public class EntryHolder
{
private Dictionary<string, double> _par = new Dictionary<string, double>(); //Dictionary holding our known variables
private List<myEntry> _entries; //List holding our entries
public EntryHolder()
{
_entries = new List<myEntry>(); //Create list
//Populate dictionary
_par.Add("Height", 500);
_par.Add("Width", 1000);
}
public bool Add(myEntry entry)
{
var otherEntry = _entries.FirstOrDefault(x => x.Name.Equals(entry.Name)); //Get entry with same name
if(otherEntry != null)
{
//Entry with the same name as another entry
//throw new DuplicateNameException(); //Throw an exception if you want
return false; //or just return false
}
//Entry to add is valid
_entries.Add(entry); //Add entry
return true; //return success
}
public void Add(List<myEntry> entries)
{
foreach (var entry in entries) //Loop through entries
{
Add(entry);
}
}
public myEntry GetEntry(string uniqueName)
{
var entry = GetRawEntry(uniqueName); //Get raw entry
var heightToCalculate = entry.Height.ToString(); //Height to calculate to string
var widthToCalculate = entry.Width.ToString(); //Width to calculate to string
entry.Height = Calculate(heightToCalculate, true); //Calculate height
entry.Width = Calculate(widthToCalculate, false); //Calculate width
return entry; //return entry
}
public List<myEntry> CalculateAllEntries()
{
List<myEntry> toReturn = new List<myEntry>(); //Create list that we will return after the calculation finished
foreach (var entryToCalculate in _entries) //Loop through all entries
{
toReturn.Add(GetEntry(entryToCalculate.Name)); //calculate entry values and add them to the list we will return after
}
return toReturn; //return list after the whole calculation finished
}
private double Calculate(string toCalculate, bool isHeight)
{
if (!toCalculate.Contains("+"))
{
//String doesn't contain a + that means it has to be a number or a key in our dictionary
object toConvert = toCalculate; //Set the object we want to convert to double
var entryCorrespondingToThisValue = _entries.FirstOrDefault(x => x.Name.Equals(toCalculate)); //Check if the string is a reference to another entry
if (entryCorrespondingToThisValue != null)
{
//It is the name of another object
toConvert = isHeight ? entryCorrespondingToThisValue.Height : entryCorrespondingToThisValue.Width; //Set object to convert to the height or width of the object in entries
}
try
{
return Convert.ToDouble(toConvert); //try to convert and return if success
}
catch (Exception e)
{
//the given height object has the wrong format
//Format: (x + Y + z ...)
throw new FormatException();
}
}
//Contains some +
var spitedString = toCalculate.Split(new char[] {'+'}); //Split
double sum = 0d; //Whole sum
foreach (var splited in spitedString) //Loop through all elements
{
double toAdd = 0; //To add default = 0
if (_par.ContainsKey(splited)) //Check if 'splited' is a key in the dictionary
{
//part of object is in the par dictionary so we get the value of it
toAdd = _par[splited]; //get value corresponding to key in dictionary
}
else
{
//'splited' is not a key in the dictionary
object toConvert = splited; //set object to convert
var entryCorrespondingToThisValue = _entries.FirstOrDefault(x => x.Name.Equals(splited)); //Does entries contain a reference to this value
if (entryCorrespondingToThisValue != null)
{
//It is a reference
toConvert = isHeight ? entryCorrespondingToThisValue.Height : entryCorrespondingToThisValue.Width; //Set to convert to references height or width
}
try
{
toAdd = Convert.ToDouble(toConvert); //Try to convert object
}
catch (Exception e)
{
//A part of the given height is not a number or is known in the par dictionary
throw new FormatException();
}
}
sum += toAdd; //Add after one iteration
}
return sum; //return whole sum
}
public myEntry GetRawEntry(string uniqueName)
{
var rawEntry = _entries.FirstOrDefault(x => x.Name.Equals(uniqueName)); //Check for entry in entries by name (unique)
if (rawEntry == null)
{
//Entry is not in the list holding all entries
throw new UnknownEntry(); //throw an exception
return null; //Or just return null
}
return rawEntry; //return entry
}
}
And here the end, the test and prove that it works:
public void TestIt()
{
List<myEntry> entries = new List<myEntry>()
{
new myEntry("P1", 0, 0),
new myEntry("P2", "P1", "P1+Height"),
new myEntry("P3", "P1+Height", "P2"),
new myEntry("P4", "P3", "P3+Height"),
};
EntryHolder myEntryHolder = new EntryHolder();
myEntryHolder.Add(entries);
var calculatedEntries = myEntryHolder.CalculateAllEntries();
}
Here an image of how it looks like:

Algorithm for shortest list of words

The issue is as follows: the user provides a StartWord and EndWord string of X letters together with a list of strings that are also of length X (lets make it 4 but probably more)
static void Main(string[] args)
{
string StartWord = "Spot";
string EndWord = "Spin";
List<string> providedList = new List<string>
{
"Spin", "Spit", "Spat", "Spot", "Span"
};
List<string> result = MyFunc(StartWord, EndWord, providedList);
}
public List<string> MyFunc(string startWord, string endWord, List<string> input)
{
???
}
From the provided parameters I need to display to the user a result that comprises of the SHORTEST list of 4 letter words, starting with StartWord and ending with EndWord with a number of intermediate words that are to be found in the list, where each word differs from the previous word by PRECISELY one letter.
For example the above code should return a list of strings containing these elements:
Spot(as FirstWord),
Spit(only one letter is different from previous word),
Spin (as EndWord)
A bad exapmle would be: Spot, Spat, Span, Spin (as it takes 3 changes compared to the above 2)
I have been looking at some matching algorithms and recursion, but I am not able to figure out how to go about this.
Thank you for any kind of help in advance.

Create a graph where the vertices are words, and an edge connects any two words that differ by one letter.
Do a breadth-first search, starting at the StartWord, looking for the shortest path to the EndWord.
Here is sample code for this solution in a different language (Python). That may give you an even better pointer. :-)
def shortestWordPath (startWord, endWord, words):
graph = {}
for word in words:
graph[word] = {"connected": []}
for word in words:
for otherWord in words:
if 1 == wordDistance(word, otherWord):
graph[word]['connected'].append(otherWord)
todo = [(startWord,0)]
while len(todo):
(thisWord, fromWord) = todo.pop(0)
if thisWord == endWord:
answer = [thisWord, fromWord]
while graph[ answer[-1] ]["from"] != 0:
answer.append(graph[ answer[-1] ]["from"])
answer.reverse()
return answer
elif "from" in graph[thisWord]:
pass # We have already processed this.
else:
graph[thisWord]["from"] = fromWord
for nextWord in graph[thisWord]["connected"]:
todo.append([nextWord, thisWord])
return None
def wordDistance (word1, word2):
return len(differentPositions(word1, word2))
def differentPositions(word1, word2):
answer = []
for i in range(0, min(len(word1), len(word2))):
if word1[i] != word2[i]:
answer.append(i)
for i in range(min(len(word1), len(word2)),
max(len(word1), len(word2))):
answer.append(i)
return answer
print shortestWordPath("Spot", "Spin",
["Spin", "Spit", "Spat", "Spot", "Span"])

This is what I ended up using(please feel free to comment on the up and down side of it):
private List<List<string>> allWordSteps;
private string[] allWords;
public List<string> WordLadder(string wordStart, string wordEnd, string[] allWordsInput)
{
var wordLadder = new List<string>() { wordStart };
this.allWordSteps = new List<List<string>>() { wordLadder };
allWords = allWordsInput;
do
{
wordLadder = this.IterateWordSteps(wordEnd);
}
while (wordLadder.Count() == 0);
return wordLadder;
}
private List<string> IterateWordSteps(string wordEnd)
{
List<List<string>> allWordStepsCopy = this.allWordSteps.ToList();
this.allWordSteps.Clear();
foreach (var wordSteps in allWordStepsCopy)
{
var adjacent = this.allWords.Where(
x => this.IsOneLetterDifferent(x, wordSteps.Last()) &&
!wordSteps.Contains(x));
if (adjacent.Contains(wordEnd))
{
wordSteps.Add(wordEnd);
return wordSteps;
}
foreach (var word in adjacent)
{
List<string> newWordStep = wordSteps.ToList();
newWordStep.Add(word);
this.allWordSteps.Add(newWordStep);
}
}
return new List<string>();
}
private bool IsOneLetterDifferent(string first, string second)
{
int differences = 0;
if (first.Length == second.Length)
{
for (int i = 0; i < first.Length; i++)
{
if (first[i] != second[i])
{
differences++;
}
}
}
return differences == 1;
}

OutOfMemoryException when updating a large list?

I have a large list and I would like to overwrite one value if required. To do this, I create two subsets of the list which seems to give me an OutOfMemoryException. Here is my code snippet:
if (ownRG != "")
{
List<string> maclist = ownRG.Split(',').ToList();
List<IVFile> temp = powlist.Where(a => maclist.Contains(a.Machine)).ToList();
powlist = powlist.Where(a => !maclist.Contains(a.Machine)).ToList(); // OOME Here
temp.ForEach(a => { a.ReportingGroup = ownRG; });
powlist.AddRange(temp);
}
Essentially I'm splitting the list into the part that needs updating and the part that doesn't, then I perform the update and put the list back together. This works fine for smaller lists, but breaks with an OutOfMemoryException on the third row within the if for a large list. Can I make this more efficient?
NOTE
powlist is the large list (>1m) items. maclist only has between 1 and 10 but even with 1 item this breaks.

Solving your issue
Here is how to rearrange your code using the enumerator code from my answer:
if (!string.IsNullOrEmpty(ownRG))
{
var maclist = new CommaSeparatedStringEnumerable(str);
var temp = powlist.Where(a => maclist.Contains(a.Machine));
foreach (var p in temp)
{
p.ReportingGroup = ownRG;
}
}
You should not use ToList in your code.
You don't need to remove thee contents of temp from powlist (you are re-adding them anyway)
Streaming over a large comma-separated string
You can iterate over the list manually instead of doing what you do now, by looking for , characters and remembering the position of the last found one and the one before. This will definitely make your app work because then it won't need to store the entire set in the memory at once.
Code example:
var str = "aaa,bbb,ccc";
var previousComma = -1;
var currentComma = 0;
for (; (currentComma = str.IndexOf(',', previousComma + 1)) != -1; previousComma = currentComma)
{
var currentItem = str.Substring(previousComma + 1, currentComma - previousComma - 1);
Console.WriteLine(currentItem);
}
var lastItem = str.Substring(previousComma + 1);
Console.WriteLine(lastItem);
Custom iterator
If you want to do it 'properly' in a fancy way, you can even write a custom enumerator:
public class CommaSeparatedStringEnumerator : IEnumerator<string>
{
int previousComma = -1;
int currentComma = -1;
string bigString = null;
bool atEnd = false;
public CommaSeparatedStringEnumerator(string s)
{
if (s == null)
throw new ArgumentNullException("s");
bigString = s;
this.Reset();
}
public string Current { get; private set; }
public void Dispose() { /* No need to do anything here */ }
object IEnumerator.Current { get { return this.Current; } }
public bool MoveNext()
{
if (atEnd)
return false;
atEnd = (currentComma = bigString.IndexOf(',', previousComma + 1)) == -1;
if (!atEnd)
Current = bigString.Substring(previousComma + 1, currentComma - previousComma - 1);
else
Current = bigString.Substring(previousComma + 1);
previousComma = currentComma;
return true;
}
public void Reset()
{
previousComma = -1;
currentComma = -1;
atEnd = false;
this.Current = null;
}
}
public class CommaSeparatedStringEnumerable : IEnumerable<string>
{
string bigString = null;
public CommaSeparatedStringEnumerable(string s)
{
if (s == null)
throw new ArgumentNullException("s");
bigString = s;
}
public IEnumerator<string> GetEnumerator()
{
return new CommaSeparatedStringEnumerator(bigString);
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
Then you can iterate over it like this:
var str = "aaa,bbb,ccc";
var enumerable = new CommaSeparatedStringEnumerable(str);
foreach (var item in enumerable)
{
Console.WriteLine(item);
}
Other thoughts
Can I make this more efficient?
Yes, you can. I suggest to either work with a more efficient data format (you can take a look around databases or XML, JSON, etc. depending on your needs). If you really want to work with comma-separated items, see my code examples above.

There's no need to create a bunch of sub-lists from powlist and reconstruct it. Simply loop over the powlist and update the ReportingGroup property accordingly.
var maclist = new HashSet<string>( ownRG.Split(',') );
foreach( var item in powlist) {
if( maclist.Contains( item.Machine ) ){
item.ReportingGroup = ownRG;
}
}
Since this changes powlist in place, you won't allocate any extra memory and shouldn't run into an OutOfMemoryException.

In a loop find the next ',' char. Take the substring between the ',' and the previous ',' position. At the end of the loop save a reference to the previous ',' position (which is initially set to 0). So you parse the items one-by-one rather than all at once.

You can try looping the items of your lists, but this will increase processing time.
foreach(var item in powlist)
{
//do your opeartions
}

dynamic flexibility in C#

I have recently started learning programming and chose .NET with Visual Studio Express. I am trying to write a CSV Parser as a learning experience and it's giving me a lot more trouble than I expected. I am starting with the reader. One thing I am doing differently in my parser is that I am not using quotes. I am escaping commas with a backslash, backslashes with a backslash, and line breaks with a backslash. For example, if a comma is preceded by an even number of backslashes it is a field and I halve any blocks of backslashes. If it's odd, it's not end of field and I still halve blocks of backslashes. I'm not sure how robust this will be if I can ever get it working, except I'm only learning at this point and I'm looking at it mostly as an exercise in manipulating data structures.
I have a question in reference to the code snippet at the bottom of this post and how to make it not so static and limiting and still compile and run for me.
The line of code that reads:
var contents = (String)fileContents;
I keep trying to make it more dynamic to increase flexibility and make it something like this:
var contents = (otherVariableThatCouldChangeTypeAtRuntime.GetType())fileContents;
Is there something I can do to get it to do this and still compile? Maybe something like Option Infer from VB.NET might help, except I can't find that.
Also, I have written this in VB.NET as well. It seems to me that VB.NET allows me a considerably more dynamic style than what I've posted below, such as not having to type var over and over again and not having to keep casting my index counting variable into an integer over and over again if I shut off Option Strict and Option Explicit as well as turn on Option Infer. For example, C# won't let me type something analogous to the following VB.NET code even though I know the methods and properties I will be calling at run-time will be there at run-time.
Dim contents As Object = returnObjectICantDetermineAtComplieTime()
contents.MethodIKnowWillBeThereAtRunTime()
Can I do these things in C#? Anyways, here's the code and thanks in advance for any responses.
public class Widget
{
public object ID { get; set; }
public object PartNumber { get; set; }
public object VendorID { get; set; }
public object TypeID { get; set; }
public object KeyMarkLoc { get; set; }
public Widget() { }
}
public object ReadFromFile(object source)
{
var fileContents = new FileService().GetFileContents(source);
object records = null;
if (fileContents == null)
return null;
var stringBuffer = "";
var contents = (String)fileContents;
while (contents.Length > 0 && contents != "\r\n")
{
for (object i = 0; (int)i < contents.Length; i=(int)i+1 )
{
object character = contents[(int)i];
if (!stringBuffer.EndsWith("\r\n"))
{
stringBuffer += character.ToString();
}
if (stringBuffer.EndsWith("\r\n"))
{
var bSlashes = getBackSlashes(stringBuffer.Substring(0, stringBuffer.Length - 4));
stringBuffer = stringBuffer.Substring(0, stringBuffer.Length - 4);
if ((int)bSlashes % 2 == 0)
{
break;
}
}
}
contents = contents.Substring(stringBuffer.Length+2);
records = records == null ? getIncrementedList(new List<object>(), getNextObject(getFields(stringBuffer))) : getIncrementedList((List<object>)records, getNextObject(getFields(stringBuffer)));
}
return records;
}
private Widget getNextRecord(object[] fields)
{
var personStudent = new Widget();
personStudent.ID = fields[0];
personStudent.PartNumber = fields[1];
personStudent.VendorID = fields[2];
personStudent.TypeID = fields[3];
personStudent.GridPath = fields[4];
return personStudent;
}
private object[] getFields(object buffer)
{
var fields = new object[5];
var intFieldCount = 0;
var fieldVal = "";
var blocks = buffer.ToString().Split(',');
foreach (var block in blocks)
{
var bSlashes = getBackSlashes(block);
var intRemoveCount = (int)bSlashes / 2;
if ((int)bSlashes % 2 == 0) // Delimiter
{
fieldVal += block.Substring(0, block.Length - intRemoveCount);
fields[intFieldCount] += fieldVal;
intFieldCount++;
fieldVal = "";
}
else // Part of Field
{
fieldVal += block.Substring(0, block.Length - intRemoveCount - 1) + ",";
}
}
return fields;
}
private object getBackSlashes(object block)
{
object bSlashes = block.ToString().Length == 0 ? new int?(0) : null;
for (object i = block.ToString().Length - 1; (int)i>-1; i=(int)i-1)
{
if (block.ToString()[(int)i] != '\\') return bSlashes = bSlashes == null ? 0 : bSlashes;
bSlashes = bSlashes == null ? 1 : (int)bSlashes + 1;
}
return bSlashes;
}
}
Here is the web service code.
[WebMethod]
public object GetFileContents(object source)
{
return File.ReadAllText(source.ToString());
}

Dim contents As Object = returnObjectICantDetermineAtComplieTime()
contents.MethodIKnowWillBeThereAtRunTime()
You can do this with the dynamic type.
See for more information: http://msdn.microsoft.com/en-us/library/dd264736.aspx

Logical Inversion of Symbol Tree

I have a class, Symbol_Group, that represents an invertible expression of the nature AB(C+DE) + FG. Symbol_Group contains a List<List<iSymbol>>, where iSymbol is an interface applied to Symbol_Group, and Symbol.
The above equation would be represented as A,B,Sym_Grp + F,G; Sym_Grp = C + D,E, where each + represents a new List<iSymbol>
I need to be able to invert and expand this equation using an algorithm that can handle any amount of nesting, and any amount of symbols anded or ored together, to produce a set of Symbol_Group, with each containing a unique expansion. For the above question the answer set would be !A!F; !B!F; !C!D!F; !C!E!F; !A!G; !B!G; !C!D!G; !C!E!G;
I know that I will need to use recursion, but I have had very little experience with it. Any help figuring out this algorithm would be appreciated.

Unless you are somehow required to use a List<List<iSymbol>>, I recommend switching to a different class structure, with a base class (or interface) Expression and subclasses (or implementors) SymbolExpression, NotExpression, OrExpression, and AndExpression. A SymbolExpression contains a single symbol; a NotExpression contains one Expression, and OrExpression and AndExpression contain two expressions each. This is a much more standard structure for working with mathematical expressions, and it is probably simpler to perform the transformations on it.
With the above classes, you can model any expression as a binary tree. Negate the expression by replacing the root by a NotExpression whose child is the original root. Then, traverse the tree with a depth-first search, and whenever you hit a NotExpression whose child is an OrExpression or an AndExpression, you can replace that by an AndExpression or an OrExpression (respectively) whose children are NotExpressions with the original children below them. You might also want to eliminate double negations (look for NotExpressions whose child is a NotExpression, and remove both).
(Whether this answer is understandable probably depends on how comfortable you are with working with trees. Let me know if you need clarification.)

After much work, this is the method I used to get the minimum terms of inversion.
public List<iSymbol> GetInvertedGroup()
{
TrimSymbolList();
List<List<iSymbol>> symbols = this.CopyListMembers(Symbols);
List<iSymbol> SymList;
while (symbols.Count > 1)
{
symbols.Add(MultiplyLists(symbols[0], symbols[1]));
symbols.RemoveRange(0, 2);
}
SymList = symbols[0];
for(int i=0;i<symbols[0].Count;i++)
{
if (SymList[i] is Symbol)
{
Symbol sym = SymList[i] as Symbol;
SymList.RemoveAt(i--);
Symbol_Group symgrp = new Symbol_Group(null);
symgrp.AddSymbol(sym);
SymList.Add(symgrp);
}
}
for (int i = 0; i < SymList.Count; i++)
{
if (SymList[i] is Symbol_Group)
{
Symbol_Group SymGrp = SymList[i] as Symbol_Group;
if (SymGrp.Symbols.Count > 1)
{
List<iSymbol> list = SymGrp.GetInvertedGroup();
SymList.RemoveAt(i--);
AddElementsOf(list, SymList);
}
}
}
return SymList;
}
public List<iSymbol> MultiplyLists(List<iSymbol> L1, List<iSymbol> L2)
{
List<iSymbol> Combined = new List<iSymbol>(L1.Count + L2.Count);
foreach (iSymbol S1 in L1)
{
foreach (iSymbol S2 in L2)
{
Symbol_Group newGrp = new Symbol_Group(null);
newGrp.AddSymbol(S1);
newGrp.AddSymbol(S2);
Combined.Add(newGrp);
}
}
return Combined;
}
This resulted in a List of Groups of Symbols, with each group representing 1 or term in the final result (e.g !A!F). Some further code was used to reduce this to a List>, as there was a reasonable amount of nesting in the answer. To reduce it, I used:
public List<List<Symbol>> ReduceList(List<iSymbol> List)
{
List<List<Symbol>> Output = new List<List<Symbol>>(List.Count);
foreach (iSymbol iSym in List)
{
if (iSym is Symbol_Group)
{
List<Symbol> L = new List<Symbol>();
(iSym as Symbol_Group).GetAllSymbols(L);
Output.Add(L);
}
else
{
throw (new Exception());
}
}
return Output;
}
public void GetAllSymbols(List<Symbol> List)
{
foreach (List<iSymbol> SubList in Symbols)
{
foreach (iSymbol iSym in SubList)
{
if (iSym is Symbol)
{
List.Add(iSym as Symbol);
}
else if (iSym is Symbol_Group)
{
(iSym as Symbol_Group).GetAllSymbols(List);
}
else
{
throw(new Exception());
}
}
}
}
Hope this helps someone else!

I came to this simpler solution after a bit of rejigging. I hope it helps out somebody else with a similar problem! This is the class structure (plus a few other properties)
public class SymbolGroup : iSymbol
{
public SymbolGroup(SymbolGroup Parent, SymRelation Relation)
{
Symbols = new List<iSymbol>();
this.Parent = Parent;
SymbolRelation = Relation;
if (SymbolRelation == SymRelation.AND)
Name = "AND Group";
else
Name = "OR Group";
}
public int Depth
{
get
{
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
return (s as SymbolGroup).Depth + 1;
}
}
return 1;
}
}
}
The method of inversion is also contained within this class. It replaces an unexpanded group in the results list with all of the expanded results of that result. It only strips away one level at a time.
public List<SymbolGroup> InvertGroup()
{
List<SymbolGroup> Results = new List<SymbolGroup>();
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
SymbolGroup sg = s as SymbolGroup;
sg.Parent = null;
Results.Add(s as SymbolGroup);
}
else if (s is Symbol)
{
SymbolGroup sg = new SymbolGroup(null, SymRelation.AND);
sg.AddSymbol(s);
Results.Add(sg);
}
}
bool AllChecked = false;
while (!AllChecked)
{
AllChecked = true;
for(int i=0;i<Results.Count;i++)
{
SymbolGroup result = Results[i];
if (result.Depth > 1)
{
AllChecked = false;
Results.RemoveAt(i--);
}
else
continue;
if (result.SymbolRelation == SymRelation.OR)
{
Results.AddRange(result.MultiplyOut());
continue;
}
for(int j=0;j<result.nSymbols;j++)
{
iSymbol s = result.Symbols[j];
if (s is SymbolGroup)
{
result.Symbols.RemoveAt(j--); //removes the symbolgroup that is being replaced, so that the rest of the group can be added to the expansion.
AllChecked = false;
SymbolGroup subResult = s as SymbolGroup;
if(subResult.SymbolRelation == SymRelation.OR)
{
List<SymbolGroup> newResults;
newResults = subResult.MultiplyOut();
foreach(SymbolGroup newSg in newResults)
{
newSg.Symbols.AddRange(result.Symbols);
}
Results.AddRange(newResults);
}
break;
}
}
}
}
return Results;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to make common prefixes for regex word stemming? - c#

Related

C# examining and replacing tuple values based on other tuple

Algorithm for shortest list of words

OutOfMemoryException when updating a large list?

dynamic flexibility in C#

Logical Inversion of Symbol Tree

Categories

Resources