Piglatin using Arrays

Piglatin using Arrays - c#

Last night I was messing around with Piglatin using Arrays and found out I could not reverse the process. How would I shift the phrase and take out the Char's "a" and "y" at the end of the word and return the original word in the phrase.
For instance if I entered "piggy" it would come out as "iggypay" shifting the word piggy so "p" is at the end of the word and "ay" is appended.
Here is the example code so you can try it as well.
public string ay;
public string PigLatin(string phrase)
{
string[] pLatin;
ArrayList pLatinPhrase = new ArrayList();
int wordLength;
pLatin = phrase.Split();
foreach (string pl in pLatin)
{
wordLength = pl.Length;
pLatinPhrase.Add(pl.Substring(1, wordLength - 1) + pl.Substring(0, 1) + "ay");
}
foreach (string p in pLatinPhrase)
{
ay += p;
}
return ay;
}
You will notice that is example is not programmed to find vowels and append them to the end along with "ay". Just simply a basic way of doing it.
If you where wondering how to reverse the above try this example of uPiglatinify
public string way;
public string uPigLatinify(string word)
{
string[] latin;
int wordLength;
// Using arrraylist to store split words.
ArrayList Phrase = new ArrayList();
// Split string phrase into words.
latin = word.Split(' ');
foreach (string i in latin)
{
wordLength = i.Length;
if (wordLength > 0)
{
// Grab 3rd letter from the end of word and append to front
// of word chopping off "ay" as it was not included in the indexing.
Phrase.Add(i.Substring(wordLength - 3, 1) + i.Substring(0, wordLength - 3) + " ");
}
}
foreach (string _word in Phrase)
{
// Add words to string and return.
way += _word;
}
return way;
}

Please don’t take this the wrong way, but although you can probably get people here to give you the C# code to implement the algorithm you want, I suspect this is not enough if you want to learn how it works. To learn the basics of programming, there are some good tutorials to delve into (whether websites or books). In particular, if you aspire to be a programmer, you will need to learn not just how to write code. In your example:
You should first write a specification of what your PigLatin function is supposed to do. Think about all the corner-cases: What if the first letter is a vowel? What if there are several consonants at the beginning? What if there are only consonants? What if the input starts with a number, a parenthesis, or a space? What if the input string is empty? Write down exactly what should happen in all of these cases — even if it’s “throw an exception”.
Only then can you implement the algorithm according to the specification (i.e. write the actual C# code). While doing this, you may find that the specification is incomplete, in which case you need to go back and correct it.
Once your code is finished, you need to test it. Run it on several testcases, especially the corner-cases you came up with above: For example, try PigLatin("air"), PigLatin("x"), PigLatin("1"), PigLatin(""), etc. In each case, make yourself aware first what behaviour you expect, and then see if the behaviour matches your expectation. If it doesn’t, you need to go back and fix the code.
Once you have implemented the forward PigLatin algorithm and it works (read: passes all your testcases), then you will already have the skills needed to write the reverse function youself. I guarantee you that you will feel achieved and excited then! Whereas, if you just copy the code from this website, you are setting yourself up for feeling dumb because you will think other people can do it and you can’t.
Of course, we are nonetheless happy to help you with specific technical questions, for example “What is the difference between ArrayList and List<string>?” or “What does the scope of a local variable mean?” (but search first — these may have already been asked before) — but you probably shouldn’t ask to have the code fully written and finished for you.

The work to split the phrase into words and recombine the words after transforming them is the same as in the original case. The difficulty is in un-pig-latin-ifying an individual word. With some error checking, I imagine you could do this:
string UnPigLatinify(string word)
{
if ((word == null) || !Regex.IsMatch(word, #"^\w+ay$", RegexOptions.IgnoreCase))
return word;
return word[word.Length - 3] + word.Substring(0, word.Length - 3);
}
The regular expression just checks to make sure the word is at least 3 letters long, composed of characters, and ends with "ay".
The actual transform takes the third to last letter (the original first letter) and appends the rest of the word minus the "ay" and the original letter.
Is this what you meant?

Related

Removing punctuation from an extremely long string

I'm working on a book encryption program for one of my courses and I've run into a problem. Our professor gave us the example of using say Pride and Prejudice as the book used to encrypt, so I chose that one to test my program. The current function I'm using to remove the punctuation from the string is taking so long that the program is being forced into break mode. This function works for smaller strings even pages long, but when I fed it Pride and Prejudice it takes way to long.
public void removePunctuation(ref string s) {
string result = "";
for (int i = 0; i < s.Length; i++) {
if (Char.IsWhiteSpace(s[i])) {
result += ' ';
} else if (!Char.IsLetter(s[i]) && !Char.IsNumber(s[i])) {
// do nothing
} else {
result += s[i];
}
}
s = result;
}
So I think I need a faster way to remove punctuation from this string if anyone has any suggestions? I know looping through every character is horrible, but I'm stumped and I was never taught Regex in depth.
Edit: I was asked how I was storing the string in the dictionary class! This is the constructor for another class that actually uses the formatted string.
public CodeBook(string book)
{
BookMap = new Dictionary<string, List<int>>();
Key = book.Split(null).ToList(); // split string into words
foreach(string s in Key)
{
if (!BookMap.Keys.Contains(s))
{
BookMap.Add(s, Enumerable.Range(0, Key.Count).Where(i => Key[i] == s).ToList());
// add word and add list of occurrances of word
}
}
}

This is slow because you construct string by concatenations in a loop. You have several approaches that are more performant:
Use StringBuilder - unlike string concatenation which constructs a new object each time you add a character, this approach expands the string under construction by larger chunks, preventing excessive garbage creation.
Use LINQ's filtering with Where - this approach constructs an array of chars in a single shot, then constructs a single string from it.
Use regular expression's Replace - this method is optimized to deal with strings of virtually unlimited sizes.
Roll your own algorithm - create an array of chars that corresponds to the length of the original string. Walk through the string, and add the characters that you wish to keep to the array. Use string's constructor that takes the array, the initial index, and the length to construct the string at once.

Looping through every character once is not that bad. You're doing it all in one pass, that's not trivial to avoid.
The problem lies in the fact that the framework will need to allocate a new copy of the (partial) string whenever you do something like
result += s[i];
You can avoid that by introducing a StringBuilder documented here to append non-punctuation characters as you go.
public string removePunctuation(string s)
{
var result = new StringBuilder();
for (int i = 0; i < s.Length; i++) {
if (Char.IsWhiteSpace(s[i])) {
result.Append(" ");
} else if (!Char.IsLetter(s[i]) && !Char.IsNumber(s[i])) {
// do nothing
} else {
result.Append(s[i]);
}
}
return result.ToString();
}
You could further reduce the number of necessary Append calls with a refined algorithm, for example look ahead to the next punctuation and append larger portions at once, or use an existing string manipulation library like RegEx. But the introduction of StringBuilder above should give you a noticable performance gain already.
I was never taught Regex in depth
Use the search provider of your choice, you may end up with a tested solution which you can just study and use: https://stackoverflow.com/a/5871826/1132334

You can use Regex to remove punctuations as below.
public string removePunctuation(string s)
{
string result = Regex.Replace(s, #"[^\w\s]", "");
return result;
}
^ Means: not these characters (letters, numbers).
\w Means: word characters.
\s Means: space characters.

How can I match and return multiple instances of a string, where single apostrophes could be contained at any index?

Please note, the 'C#' tag was included intentionally, because I could accept C# syntax for my answer here, as I have the option of doing this both client-side and server-side. Read the 'Things You May Want To Know' section below. Also, the 'regex' tag was included because there is a strong possibility that the use of regular expressions is the best approach to this problem.
I have the following highlight Plug-In found here:
http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html
And here is the code in that plug-in:
/*
highlight v4
Highlights arbitrary terms.
<http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html>
MIT license.
Johann Burkard
<http://johannburkard.de>
<mailto:jb#eaio.com>
*/
jQuery.fn.highlight = function(pat) {
function innerHighlight(node, pat) {
var skip = 0;
if (node.nodeType == 3) {
var pos = node.data.toUpperCase().indexOf(pat);
if (pos >= 0) {
var spannode = document.createElement('span');
spannode.className = 'highlight';
var middlebit = node.splitText(pos);
var endbit = middlebit.splitText(pat.length);
var middleclone = middlebit.cloneNode(true);
spannode.appendChild(middleclone);
middlebit.parentNode.replaceChild(spannode, middlebit);
skip = 1;
}
}
else if (node.nodeType == 1 && node.childNodes && !/(script|style)/i.test(node.tagName)) {
for (var i = 0; i < node.childNodes.length; ++i) {
i += innerHighlight(node.childNodes[i], pat);
}
}
return skip;
}
return this.length && pat && pat.length ? this.each(function() {
innerHighlight(this, pat.toUpperCase());
}) : this;
};
jQuery.fn.removeHighlight = function() {
return this.find("span.highlight").each(function() {
this.parentNode.firstChild.nodeName;
with (this.parentNode) {
replaceChild(this.firstChild, this);
normalize();
}
}).end();
};
This plug-in works pretty easily.
If I wanted to highlight all instances of the word "Farm" within the following element...(cont.)
<div id="#myDiv">Farmers farm at Farmer's Market</div>
...(cont.) all I would need to do is use:
$("#myDiv").highlight("farm");
And then it would highlight the first four characters in "Farmers" and "Farmer's", as well as the entire word "farm" within the div#myDiv
No problem there, but I would like it to use this:
$("#myDiv").highlight("Farmers");
And have it highlight both "Farmers" AND "Farmer's". The problem is, of course, that I don't know the value of the search term (The term "Farmers" in this example) at runtime. So I would need to detect all possibilities of no more than one apostrophe at each index of the string. For instance, if I called $("#myDiv").highlight("Farmers"); like in my code example above, I would also need to highlight each instance of the original string, plus:
'Farmers
F'armers
Fa'rmers
Far'mers
Farm'ers
Farme'rs
Farmer's
Farmers'
Instances where two or more apostrophes are found sid-by-side, like "Fa''rmers" should, of course, not be highlighted.
I suppose it would be nice if I could include (to be highlighted) words like "Fa'rmer's", but I won't push my luck, and I would be doing well just to get matches like those found in my bulleted list above, where only one apostrophe appears in the string, at all.
I thought about regex, but I don't know the syntax that well, not to mention that I don't think I could do anything with a true/false return value.
Is there anyway to accomplish what I need here?
Things You May Want To Know:
The highlight plug-in takes care of all the case insensitive requirements I need, so no need to worry about that, at all.
Syntax provided in JavaScript, jQuery, or even C# is acceptable, considering the hidden input fields I use the values from, client-side, are populated, server-side, with my C# code.
The C# code that populates the hidden input fields uses Razor (i.e., I am in a C#.Net Web-Pages w/ WebMatrix environment. This code is very simple, however, and looks like this:
for (var n = 0; n < searchTermsArray.Length; n++)
{
<input class="highlightTerm" type="hidden" value="#searchTermsArray[n]" />
}

I'm copying this answer from your earlier question.
I think after reading the comments on the other answers, I've figured out what it is you're going for. You don't need a single regex that can do this for any possible input, you already have input, and you need to build a regex that matches it and its variations. What you need to do is this. To be clear, since you misinterpreted in your question, the following syntax is actually in JavaScript.
var re = new RegExp("'?" + "farmers".split("").join("'?") + "'?", "i")
What this does is take your input string, "farmers" and split it into a list of the individual characters.
"farmers".split("") == [ 'f', 'a', 'r', 'm', 'e', 'r', 's' ]
It then stitches the characters back together again with "'?" between them. In a regular expression, this means that the ' character will be optional. I add the same particle to the beginning and end of the expression to match at the beginning and end of the string as well.
This will create a regex that matches in the way you're describing, provided it's OK that it also matches the original string.
In this case, the above line builds this regex:
/'?f'?a'?r'?m'?e'?r'?s'?/
EDIT
After looking at this a bit, and the function you're using, I think your best bet will be to modify the highlight function to use a regex instead of a straight string replacement. I don't think it'll even be that hard to deal with. Here's a completely untested stab at it.
function innerHighlight(node, pat) {
var skip = 0;
if (node.nodeType == 3) {
var matchResult = pat.exec(node.data); // exec the regex instead of toUpperCase-ing the string
var pos = matchResult !== null ? matchResult.index : -1; // index is the location of where the matching text is found
if (pos >= 0) {
var spannode = document.createElement('span');
spannode.className = 'highlight';
var middlebit = node.splitText(pos);
var endbit = middlebit.splitText(matchResult[0].length); // matchResult[0] is the last matching characters.
var middleclone = middlebit.cloneNode(true);
spannode.appendChild(middleclone);
middlebit.parentNode.replaceChild(spannode, middlebit);
skip = 1;
}
}
else if (node.nodeType == 1 && node.childNodes && !/(script|style)/i.test(node.tagName)) {
for (var i = 0; i < node.childNodes.length; ++i) {
i += innerHighlight(node.childNodes[i], pat);
}
}
return skip;
}
What I'm attempting to do here is keep the existing logic, but use the Regex that I built to do the finding and splitting of the string. Note that I'm not doing the toUpper call anymore, but that I've made the regex case insensitive instead. As noted, I didn't test this at all, but it seems like it should be pretty close to a working solution. Enough to get you started anyway.
Note that this won't get you your hidden fields. I'm not sure what you need those for, but this will (if it's right) take care of highlighting the string.

Efficient determination of which strings in an array are substrings of the others?

In C#, Say you have an array of strings, which contain only characters '0' and '1':
string[] input = { "0101", "101", "11", "010101011" };
And you'd like to build a function:
public void IdentifySubstrings(string[] input) { ... }
That will produce the following:
"0101 is a substring of 010101011"
"101 is a substring of 0101"
"101 is a substring of 010101011"
"11 is a substring of 010101011"
And you are NOT able to use built-in string functionality (such as String.Substring).
How would one efficiently solve this problem? Of course you could plow through it via brute force, but it just feels like there ought to be a way to accomplish it with a tree (since the only values are 0's and 1's, it feels like a binary tree ought to fit somehow). I've read a little bit about things like suffix trees, but I'm uncertain if that's the right path to be going down.
Any efficient solutions you can think of?

First of all, You have no choice but each byte (or bit ;-) in the searched string at least once. Probably best to leave them as bytes. Then implement a Trie (or variant). Load all substrings into the trie. The node objects should contain members identifiying which to of the loaded array elements they belong. Then search it with each substring and make your matches.

Haven't tested this, but's it's close
var string2FindLen = string2Find.Length;
var ndx = 0;
var x = string2Find[ndx];
foreach(var c in string2LookIn)
{
if (ndx == string2FindLen) return true;
if (c==x) x = string2Find[++ndx];
else ndx = 0;
}
return false;

C# Efficient Substring with many inputs

Assuming I do not want to use external libraries or more than a dozen or so extra lines of code (i.e. clear code, not code golf code), can I do better than string.Contains to handle a collection of input strings and a collection of keywords to check for?
Obviously one can use objString.Contains(objString2) to do a simple substring check. However, there are many well-known algorithms which are able to do better than this under special circumstances, particularly if one is working with multiple strings. But sticking such an algorithm into my code would probably add length and complexity, so I'd rather use some sort of shortcut based on a built in function.
E.g. an input would be a collection of strings, a collection of positive keywords, and a collection of negative keywords. Output would be a subset of the first collection of keywords, all of which had at least 1 positive keyword but 0 negative keywords.
Oh, and please don't mention regular expressions as a suggested solutions.
It may be that my requirements are mutually exclusive (not much extra code, no external libraries or regex, better than String.Contains), but I thought I'd ask.
Edit:
A lot of people are only offering silly improvements that won't beat an intelligently used call to contains by much, if anything. Some people are trying to call Contains more intelligently, which completely misses the point of my question. So here's an example of a problem to try solving. LBushkin's solution is an example of someone offering a solution that probably is asymptotically better than standard contains:
Suppose you have 10,000 positive keywords of length 5-15 characters, 0 negative keywords (this seems to confuse people), and 1 1,000,000 character string. Check if the 1,000,000 character string contains at least 1 of the positive keywords.
I suppose one solution is to create an FSA. Another is delimit on spaces and use hashes.

Your discussion of "negative and positive" keywords is somewhat confusing - and could use some clarification to get more complete answers.
As with all performance related questions - you should first write the simple version and then profile it to determine where the bottlenecks are - these can be unintuitive and hard to predict. Having said that...
One way to optimize the search may (if you are always searching for "words" - and not phrases that could contains spaces) would be to build a search index of from your string.
The search index could either be a sorted array (for binary search) or a dictionary. A dictionary would likely prove faster - both because dictionaries are hashmaps internally with O(1) lookup, and a dictionary will naturally eliminate duplicate values in the search source - thereby reducing the number of comparions you need to perform.
The general search algorithm is:
For each string you are searching against:
Take the string you are searching within and tokenize it into individual words (delimited by whitespace)
Populate the tokens into a search index (either a sorted array or dictionary)
Search the index for your "negative keywords", if one is found, skip to the next search string
Search the index for your "positive keywords", when one is found, add it to a dictionary as they (you could also track a count of how often the word appears)
Here's an example using a sorted array and binary search in C# 2.0:
NOTE: You could switch from string[] to List<string> easily enough, I leave that to you.
string[] FindKeyWordOccurence( string[] stringsToSearch,
string[] positiveKeywords,
string[] negativeKeywords )
{
Dictionary<string,int> foundKeywords = new Dictionary<string,int>();
foreach( string searchIn in stringsToSearch )
{
// tokenize and sort the input to make searches faster
string[] tokenizedList = searchIn.Split( ' ' );
Array.Sort( tokenizedList );
// if any negative keywords exist, skip to the next search string...
foreach( string negKeyword in negativeKeywords )
if( Array.BinarySearch( tokenizedList, negKeyword ) >= 0 )
continue; // skip to next search string...
// for each positive keyword, add to dictionary to keep track of it
// we could have also used a SortedList, but the dictionary is easier
foreach( string posKeyword in positiveKeyWords )
if( Array.BinarySearch( tokenizedList, posKeyword ) >= 0 )
foundKeywords[posKeyword] = 1;
}
// convert the Keys in the dictionary (our found keywords) to an array...
string[] foundKeywordsArray = new string[foundKeywords.Keys.Count];
foundKeywords.Keys.CopyTo( foundKeywordArray, 0 );
return foundKeywordsArray;
}
Here's a version that uses a dictionary-based index and LINQ in C# 3.0:
NOTE: This is not the most LINQ-y way to do it, I could use Union() and SelectMany() to write the entire algorithm as a single big LINQ statement - but I find this to be easier to understand.
public IEnumerable<string> FindOccurences( IEnumerable<string> searchStrings,
IEnumerable<string> positiveKeywords,
IEnumerable<string> negativeKeywords )
{
var foundKeywordsDict = new Dictionary<string, int>();
foreach( var searchIn in searchStrings )
{
// tokenize the search string...
var tokenizedDictionary = searchIn.Split( ' ' ).ToDictionary( x => x );
// skip if any negative keywords exist...
if( negativeKeywords.Any( tokenizedDictionary.ContainsKey ) )
continue;
// merge found positive keywords into dictionary...
// an example of where Enumerable.ForEach() would be nice...
var found = positiveKeywords.Where(tokenizedDictionary.ContainsKey)
foreach (var keyword in found)
foundKeywordsDict[keyword] = 1;
}
return foundKeywordsDict.Keys;
}

If you add this extension method:
public static bool ContainsAny(this string testString, IEnumerable<string> keywords)
{
foreach (var keyword in keywords)
{
if (testString.Contains(keyword))
return true;
}
return false;
}
Then this becomes a one line statement:
var results = testStrings.Where(t => !t.ContainsAny(badKeywordCollection)).Where(t => t.ContainsAny(goodKeywordCollection));
This isn't necessarily any faster than doing the contains checks, except that it will do them efficiently, due to LINQ's streaming of results preventing any unnecessary contains calls.... Plus, the resulting code being a one liner is nice.

If you're truly just looking for space-delimited words, this code would be a very simple implementation:
static void Main(string[] args)
{
string sIn = "This is a string that isn't nearly as long as it should be " +
"but should still serve to prove an algorithm";
string[] sFor = { "string", "as", "not" };
Console.WriteLine(string.Join(", ", FindAny(sIn, sFor)));
}
private static string[] FindAny(string searchIn, string[] searchFor)
{
HashSet<String> hsIn = new HashSet<string>(searchIn.Split());
HashSet<String> hsFor = new HashSet<string>(searchFor);
return hsIn.Intersect(hsFor).ToArray();
}
If you only wanted a yes/no answer (as I see now may have been the case) there's another method of hashset "Overlaps" that's probably better optimized for that:
private static bool FindAny(string searchIn, string[] searchFor)
{
HashSet<String> hsIn = new HashSet<string>(searchIn.Split());
HashSet<String> hsFor = new HashSet<string>(searchFor);
return hsIn.Overlaps(hsFor);
}

Well, there is the Split() method you can call on a string. You could split your input strings into arrays of words using Split() then do a one-to-one check of words with keywords. I have no idea if or under what circumstances this would be faster than using Contains(), however.

First get rid of all the strings that contain negative words. I would suggest doing this using the Contains method. I would think that Contains() is faster then splitting, sorting, and searching.

Seems to me that the best way to do this is take your match strings (both positive and negative) and compute a hash of them. Then march through your million string computing n hashes (in your case it's 10 for strings of length 5-15) and match against the hashes for your match strings. If you get hash matches, then you do an actual string compare to rule out the false positive. There are a number of good ways to optimize this by bucketing your match strings by length and creating hashes based on the string size for a particular bucket.
So you get something like:
IList<Buckets> buckets = BuildBuckets(matchStrings);
int shortestLength = buckets[0].Length;
for (int i = 0; i < inputString.Length - shortestLength; i++) {
foreach (Bucket b in buckets) {
if (i + b.Length >= inputString.Length)
continue;
string candidate = inputString.Substring(i, b.Length);
int hash = ComputeHash(candidate);
foreach (MatchString match in b.MatchStrings) {
if (hash != match.Hash)
continue;
if (candidate == match.String) {
if (match.IsPositive) {
// positive case
}
else {
// negative case
}
}
}
}
}

To optimize Contains(), you need a tree (or trie) structure of your positive/negative words.
That should speed up everything (O(n) vs O(nm), n=size of string, m=avg word size) and the code is relatively small & easy.

Recursive woes - reducing an input string

I'm working on a portion of code that is essentially trying to reduce a list of strings down to a single string recursively.
I have an internal database built up of matching string arrays of varying length (say array lengths of 2-4).
An example input string array would be:
{"The", "dog", "ran", "away"}
And for further example, my database could be made up of string arrays in this manner:
(length 2) {{"The", "dog"},{"dog", "ran"}, {"ran", "away"}}
(length 3) {{"The", "dog", "ran"}.... and so on
So, what I am attempting to do is recursively reduce my input string array down to a single token. So ideally it would parse something like this:
1) {"The", "dog", "ran", "away"}
Say that (seq1) = {"The", "dog"} and (seq2) = {"ran", "away"}
2) { (seq1), "ran", "away"}
3) { (seq1), (seq2)}
In my sequence database I know that, for instance, seq3 = {(seq1), (seq2)}
4) { (seq3) }
So, when it is down to a single token, I'm happy and the function would end.
Here is an outline of my current program logic:
public void Tokenize(Arraylist<T> string_array, int current_size)
{
// retrieve all known sequences of length [current_size] (from global list array)
loc_sequences_by_length = sequences_by_length[current_size-min_size]; // sequences of length 2 are stored in position 0 and so on
// escape cases
if (string_array.Count == 1)
{
// finished successfully
return;
}
else if (string_array.Count < current_size)
{
// checking sequences of greater length than input string, bail
return;
}
else
{
// split input string into chunks of size [current_size] and compare to local database
// of known sequences
// (splitting code works fine)
foreach (comparison)
{
if (match_found)
{
// update input string and recall function to find other matches
string_array[found_array_position] = new_sequence;
string_array.Removerange[found_array_position+1, new_sequence.Length-1];
Tokenize(string_array, current_size)
}
}
}
// ran through unsuccessfully, increment length and try again for new sequence group
current_size++;
if (current_size > MAX_SIZE)
return;
else
Tokenize(string_array, current_size);
}
I thought it was straightforward enough, but have been getting some strange results.
Generally it appears to work, but upon further review of my output data I'm seeing some issues. Mainly, it appears to work up to a certain point...and at that point my 'curr_size' counter resets to the minimum value.
So it is called with a size of 2, then 3, then 4, then resets to 2.
My assumption was that it would run up to my predetermined max size, and then bail completely.
I tried to simplify my code as much as possible, so there are probably some simple syntax errors in transcribing. If there is any other detail that may help an eagle-eyed SO user, please let me know and I'll edit.
Thanks in advance

One bug is:
string_array[found_array_position] = new_sequence;
I don't know where this is defined, and as far as I can tell if it was defined, it is never changed.
In your if statement, when if match_found ever set to true?
Also, it appears you have an extra close brace here, but you may want the last block of code to be outside of the function:
}
}
}
It would help if you cleaned up the code, to make it easier to read. Once we get past the syntactic errors it will be easier to see what is going on, I think.

Not sure what all the issues are, but the first thing I'd do is have your "catch-all" exit block right at the beginning of your method.
public void Tokenize(Arraylist<T> string_array, int current_size)
{
if (current_size > MAX_SIZE)
return;
// Guts go here
Tokenize(string_array, ++current_size);
}
A couple things:
Your tokens are not clearly separated from your input string values. This makes it more difficult to handle, and to see what's going on.
It looks like you're writing pseudo-code:
loc_sequences_by_length is not used
found_array_position is not defined
Arraylist should be ArrayList.
etc.
Overall I agree with James' statement:
It would help if you cleaned up the
code, to make it easier to read.
-Doug

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.