Parsing routine for over-sign characters relatively slow - c#

I am trying to make the enclosed subroutine more performant using NET framework 4.6.1 although I will eventually port it to net core 2.2 .
It may run up to 250,000 times when parsing a file.
Using Visual Studio Performance Analyzer I see this routine seems to have a fairly high relative cost in the whole parsing process.
The code is part of a parsing program whose input is a binary file that contains some very old record formats that contain over-signed numbers.
Over-signed Numbers (background)
Instead of a minus sign and in order to save space the last digit is made a letter if the number is negative. Its a very old standard dating back to when computers had limited memory and fixed width records were required for performance.
When parsing I need to convert the last letter back to a number and make the number negative
Some examples of input and output of the routine
00056K = -562
00032N = -325
Current Code (slow)
private int ConvertOverSign(string overSignedString)
{
switch(overSignedString.Substring(overSignedString.Length -1,1))
{
case " ":
return 0;
case "J":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "1");
case "K":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "2");
case "L":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "3");
case "M":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "4");
case "N":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "5");
case "O":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "6");
case "P":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "7");
case "Q":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "8");
case "R":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "9");
case "!":
return -Convert.ToInt32(overSignedString.Substring(0,overSignedString.Length -1) + "0");
default:
return Convert.ToInt32(overSignedString);
}
}

Not sure the below solution is fully equivalent to yours, but at least should give you a hint on how to make a very fast string-to-number parser.
private int ConvertOverSign(string overSignedString)
{
if (overSignedString == " ") return 0;
int value = 0;
for (int i = 0; i < overSignedString.Length; i++)
{
char ch = overSignedString[i];
switch (ch)
{
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
value = value * 10 + (ch - 0x30);
break;
case '!':
value *= 10;
return -value;
case 'J':
case 'K':
case 'L':
case 'M':
case 'N':
case 'O':
case 'P':
case 'Q':
case 'R':
value = value * 10 + (ch - 'I');
return -value;
}
}
return value;
}
Bear in mind that string manipulations (e.g. Substring) are typically heavy if you need performance.

Switch over the indexed character. Substring is actually alocating a new string and that is slow:
switch (overSignedString[Length - 1])
{
case ' ':
return 0;
case "J":
return ...
You might want to read this to see if its worth parsing the string inside each case avoiding Convert. There are faster ways.

Your method is slow because it generates a lot of string garbage.
You could improve it by comparing characters instead of strings, and perform multiplication of the resulting integer instead of appending strings and using a lookup instead of a switch:
private Dictionary<char, int> _additions = new Dictionary<char, int>
{
{ '!', 0 },
{ 'J', 1 },
{ 'K', 2 },
{ 'L', 3 },
{ 'M', 4 },
{ 'N', 5 },
{ 'O', 6 },
{ 'P', 7 },
{ 'Q', 8 },
{ 'R', 9 },
};
private int ConvertOverSign(string overSignedString)
{
var lastChar = overSignedString[overSignedString.Length -1];
if (lastChar == ' ')
{
return 0;
}
if (!_additions.TryGetValue(lastChar, out int addition))
{
return Convert.ToInt32(overSignedString);
}
var result = (Convert.ToInt32(overSignedString.Substring(0, overSignedString.Length - 1)) * -10) - addition;
return result;
}

Related

Optimal Compare Algorithm for finding string matches in List of strings C#

Say I have a list of 100,000 words. I want to find out if a given string matches any words in that list, and I want to do it in the fastest way possible. Also I want to know if any other words, that are formed by starting with the first character, in that string appear in the list.
For example:
Say you have the string "icedtgg"
"i"
"ic"
"ice"
"iced"
"icedt"
"icedtg"
"icedtgg"
I am trying to come up with an optimal compare algorithm that tells me if the following words are in my list.
What I have so far is my list of 100,000 words are stored in a
Dicitonary<char, List<string>> WordList;
where char is the first character of the word, and the List<string> is all of the words that start with that character.
So WordList['a']
has a list of all words that start with 'a' ("ape", "apple", "art" etc.) and 'b' has a list of all words that start with b etc.
Since I know that all of my words start with 'i', I can first narrow my solution down from 100,000 words to just the words that start with 'i'.
List<string> CurrentWordList = WordList['i'];
Now I check
if( CurrentWordList[0].Length == 1 )
Then I know my first string is a match "i" because "i" will be the first word im the list. These lists are sorted alphabetically beforehand, so as not to slow down the matching.
Any ideas?
*No this is not a HW assigment, I am a profesionall Software Architect trying to find an optimal match algorithm for fun/hobby/game development.
I decided to add this answer not because it is the optimal solution to your problem, but to illustrate two possible solutions that are relatively simple and that are somewhat in line with the approach you seem to be following yourself.
The (non-optimized) sample below provides an extremely simple prefix trie implementation, that uses a node per consumed character.
public class SimplePrefixTrie
{
private readonly Node _root = new Node(); // root represents empty string.
private class Node
{
public Dictionary<char, Node> Children;
public bool IsTerminal; // whether a full word ends here.
public Node Find(string word, int index)
{
var child = default(Node);
if (index < word.Length && Children != null)
Children.TryGetValue(word[index], out child);
return child;
}
public Node Add(string word, int toConsume)
{
var child = default(Node);
if (toConsume == word.Length)
this.IsTerminal = true;
else if (Children == null || !Children.TryGetValue(word[toConsume], out child))
{
if (Children == null)
Children = new Dictionary<char, Node>();
Children[word[toConsume]] = child = new Node();
}
return child;
}
}
public void AddWord(string word)
{
var ndx = 0;
var cur = _root;
while (cur != null)
cur = cur.Add(word, ndx++);
}
public IEnumerable<string> FindWordsMatchingPrefixesOf(string searchWord)
{
var ndx = 0;
var cur = _root;
while (cur != null)
{
if (cur.IsTerminal)
yield return searchWord.Substring(0, ndx);
cur = cur.Find(searchWord, ndx++);
}
}
}
A simple implementation of a compressed prefix trie is also added below. It follows an almost identical approach to the sample above, but stores shared prefix parts, instead of single characters. It splits nodes when an existing stored prefix becomes shared and needs to be split into two parts.
public class SimpleCompressedPrefixTrie
{
private readonly Node _root = new Node();
private class Node
{
private Dictionary<char, Node> _children;
public string PrefixValue = string.Empty;
public bool IsTerminal;
public Node Add(string word, ref int startIndex)
{
var n = FindSharedPrefix(word, startIndex);
startIndex += n;
if (n == PrefixValue.Length) // full prefix match
{
if (startIndex == word.Length) // full match
IsTerminal = true;
else
return AddToChild(word, ref startIndex);
}
else // partial match, need to split this node's prefix.
SplittingAdd(word, n, ref startIndex);
return null;
}
public Node Find(string word, ref int startIndex, out int matchLen)
{
var n = FindSharedPrefix(word, startIndex);
startIndex += n;
matchLen = -1;
if (n == PrefixValue.Length)
{
if (IsTerminal)
matchLen = startIndex;
var child = default(Node);
if (_children != null && startIndex < word.Length && _children.TryGetValue(word[startIndex], out child))
{
startIndex++; // consumed map key character.
return child;
}
}
return null;
}
private Node AddToChild(string word, ref int startIndex)
{
var key = word[startIndex++]; // consume the mapping character
var nextNode = default(Node);
if (_children == null)
_children = new Dictionary<char, Node>();
else if (_children.TryGetValue(key, out nextNode))
return nextNode;
var remainder = word.Substring(startIndex);
_children[key] = new Node() { PrefixValue = remainder, IsTerminal = true };
return null; // consumed.
}
private void SplittingAdd(string word, int n, ref int startIndex)
{
var curChildren = _children;
_children = new Dictionary<char, Node>();
_children[PrefixValue[n]] = new Node()
{
PrefixValue = this.PrefixValue.Substring(n + 1),
IsTerminal = this.IsTerminal,
_children = curChildren
};
PrefixValue = PrefixValue.Substring(0, n);
IsTerminal = startIndex == word.Length;
if (!IsTerminal)
{
var prefix = word.Length > startIndex + 1 ? word.Substring(startIndex + 1) : string.Empty;
_children[word[startIndex]] = new Node() { PrefixValue = prefix, IsTerminal = true };
startIndex++;
}
}
private int FindSharedPrefix(string word, int startIndex)
{
var n = Math.Min(PrefixValue.Length, word.Length - startIndex);
var len = 0;
while (len < n && PrefixValue[len] == word[len + startIndex])
len++;
return len;
}
}
public void AddWord(string word)
{
var ndx = 0;
var cur = _root;
while (cur != null)
cur = cur.Add(word, ref ndx);
}
public IEnumerable<string> FindWordsMatchingPrefixesOf(string searchWord)
{
var startNdx = 0;
var cur = _root;
while (cur != null)
{
var matchLen = 0;
cur = cur.Find(searchWord, ref startNdx, out matchLen);
if (matchLen > 0)
yield return searchWord.Substring(0, matchLen);
};
}
}
Usage examples:
var trie = new SimplePrefixTrie(); // or new SimpleCompressedPrefixTrie();
trie.AddWord("hello");
trie.AddWord("iced");
trie.AddWord("i");
trie.AddWord("ice");
trie.AddWord("icecone");
trie.AddWord("dtgg");
trie.AddWord("hicet");
foreach (var w in trie.FindWordsMatchingPrefixesOf("icedtgg"))
Console.WriteLine(w);
With output:
i
ice
iced
UPDATE: Selecting the right data structure matters
I think an update could provide some value to illustrate how selecting a data structure that fits the problem well is important and what kinds of trade-offs are involved. Therefore I created a small benchmark application that tests the strategies in the answers provided to this question so far, versus a baseline reference implementation.
Naive: Is the simplest possible naive solution.
JimMischel: Is based on the approach from this answer.
MattyMerrix: Is based on your own answer here.
JimMattyDSL: Combines the 'JimMischel' and 'MattyMerrix' approaches and uses a more optimal binary string search in the sorted list.
SimpleTrie and CompessedTrie are based on the two implementations described in this answer.
The full benchmark code can be found in this gist. The results of running it with dictionaries of 10,000, 100,000, and 1,000,000 (randomly generated character sequence) words and searching for all prefix matches of 5,000 terms are:
Matching 5000 words to dictionary of 10000 terms of max length 25
Method Memory (MB) Build Time (s) Lookup Time (s)
Naive 0.64-0.64, 0.64 0.001-0.002, 0.001 6.136-6.312, 6.210
JimMischel 0.84-0.84, 0.84 0.013-0.018, 0.016 0.083-0.113, 0.102
JimMattyDSL 0.80-0.81, 0.80 0.013-0.018, 0.016 0.008-0.011, 0.010
SimpleTrie 24.55-24.56, 24.56 0.042-0.056, 0.051 0.002-0.002, 0.002
CompessedTrie 1.84-1.84, 1.84 0.003-0.003, 0.003 0.002-0.002, 0.002
MattyMerrix 0.83-0.83, 0.83 0.017-0.017, 0.017 0.034-0.034, 0.034
Matching 5000 words to dictionary of 100000 terms of max length 25
Method Memory (MB) Build Time (s) Lookup Time (s)
Naive 6.01-6.01, 6.01 0.024-0.026, 0.025 65.651-65.758, 65.715
JimMischel 6.32-6.32, 6.32 0.232-0.236, 0.233 1.208-1.254, 1.235
JimMattyDSL 5.95-5.96, 5.96 0.264-0.269, 0.266 0.050-0.052, 0.051
SimpleTrie 226.49-226.49, 226.49 0.932-0.962, 0.951 0.004-0.004, 0.004
CompessedTrie 16.10-16.10, 16.10 0.101-0.126, 0.111 0.003-0.003, 0.003
MattyMerrix 6.15-6.15, 6.15 0.254-0.269, 0.259 0.414-0.418, 0.416
Matching 5000 words to dictionary of 1000000 terms of max length 25
Method Memory (MB) Build Time (s) Lookup Time (s)
JimMischel 57.69-57.69, 57.69 3.027-3.086, 3.052 16.341-16.415, 16.373
JimMattyDSL 60.88-60.88, 60.88 3.396-3.484, 3.453 0.399-0.400, 0.399
SimpleTrie 2124.57-2124.57, 2124.57 11.622-11.989, 11.860 0.006-0.006, 0.006
CompessedTrie 166.59-166.59, 166.59 2.813-2.832, 2.823 0.005-0.005, 0.005
MattyMerrix 62.71-62.73, 62.72 3.230-3.270, 3.251 6.996-7.015, 7.008
As you can see, memory required for the (non-space optimized) tries is substantially higher. It increases by the size of the dictionary, O(N) for all of the tested implementations.
As expected, lookup time for the tries is more or less constant: O(k), dependent on the length of the search terms only. For the other implementations, time will increase based on the size of the dictionary to be searched.
Note that far more optimal implementations for this problem can be constructed, that will be close to O(k) for search time and allow a more compact storage and reduced memory footprint. If you map to a reduced alphabet (e.g. 'A'-'Z' only), this is also something that can be taken advantage of.
So you just want to find the words in the dictionary that are prefixes of the input string? You can do this much more efficiently than any of the methods proposed. It's really just a modified merge.
If your word list consists of a dictionary keyed by first letter, with each entry containing a sorted list of words that begin with that letter, then this will do it. Worst case is O(n + m), where n is the number of words that start with the letter, and m is the length of the input string.
var inputString = "icegdt";
// get list of words that start with the first character
var wordsList = MyDictionary[input_string[0]];
// find all words that are prefixes of the input string
var iInput = 0;
var iWords = 0;
var prefix = inputString.Substring(0, iInput+1);
while (iInput < inputString.Length && iWords < wordsList.Count)
{
if (wordsList[iWords] == prefix)
{
// wordsList[iWords] is found!
++iWords;
}
else if (wordsList[iWords] > prefix)
{
// The current word is alphabetically after the prefix.
// So we need the next character.
++iInput;
if (iInput < inputString.Length)
{
prefix = inputString.Substring(0, iInput+1);
}
}
else
{
// The prefix is alphabetically after the current word.
// Advance the current word.
++iWord;
}
}
If this is all you want to do (find dictionary words that are prefixes of the input string), then there's no particular reason for your dictionary indexed by first character. Given a sorted list of words, you could do a binary search on the first letter to find the starting point. That would take slightly more time than the dictionary lookup, but the time difference would be very small compared to the time spent searching the word list for matches. In addition, the sorted word list would take less memory than the dictionary approach.
If you want to do case-insensitive comparisons, change the comparison code to:
var result = String.Compare(wordsList[iWords], prefix, true);
if (result == 0)
{
// wordsList[iWords] is found!
++iWords;
}
else if (result > 0)
{
That also reduces the number of string comparisons per iteration to exactly one per iteration.
while (x < str.Length-1)
{
if (ChrW(10) == GetChar(str, x) && ChrW(13) == GetChar(str, x+1))
{
// x+2 - This new line
}
x++;
}
Here is my first go at it, wanted to get this out there in case I cant finish it today.
public class CompareHelper
{
//Should always be sorted in alphabetical order.
public static Dictionary<char, List<string>> MyDictionary;
public static List<string> CurrentWordList;
public static List<string> MatchedWordList;
//The word we are trying to find matches for.
public static char InitChar;
public static StringBuilder ThisWord;
/// <summary>
/// Initialize the Compare. Set the first character. See if there are any 1 letter words
/// for that character.
/// </summary>
/// <param name="firstChar">The first character in the word string.</param>
/// <returns>True if a word was found.</returns>
public static bool InitCompare(char firstChar)
{
InitChar = firstChar;
//Get all words that start with the firstChar.
CurrentWordList = MyDictionary[InitChar];
ThisWord = new StringBuilder();
ThisWord.Append(firstChar);
if (CurrentWordList[0].Length == 1)
{
//Match.
return true;
}
//No matches.
return false;
}
/// <summary>
/// Append this letter to our ThisWord. See if there are any matching words.
/// </summary>
/// <param name="nextChar">The next character in the word string.</param>
/// <returns>True if a word was found.</returns>
public static bool NextCompare(char nextChar)
{
ThisWord.Append(nextChar);
int currentIndex = ThisWord.Length - 1;
if (FindRemainingWords(nextChar, currentIndex))
{
if (CurrentWordList[0].Length == currentIndex)
{
//Match.
return true;
}
}
//No matches.
return false;
}
/// <summary>
/// Trim down our CurrentWordList until it only contains words
/// that at currIndex start with the currChar.
/// </summary>
/// <param name="currChar">The next letter in our ThisWord.</param>
/// <param name="currIndex">The index of the letter.</param>
/// <returns>True if there are words remaining in CurrentWordList.</returns>
private static bool FindRemainingWords(char currChar, int currIndex)
{
//Null check.
if (CurrentWordList == null || CurrentWordList.Count < 1)
{
return false;
}
bool doneSearching = false;
while(!doneSearching)
{
int middleIndex = CurrentWordList.Count / 2;
//TODO: test for CurrentWordList.count 2 or 1 ...
//TODO: test for wordToCheck.length < curr index
char middleLetter = CurrentWordList[middleIndex][currIndex];
LetterPositionEnum returnEnum = GetLetterPosition(currChar, middleLetter);
switch(returnEnum)
{
case LetterPositionEnum.Before:
CurrentWordList = CurrentWordList.GetRange(middleIndex, (CurrentWordList.Count - middleIndex));
break;
case LetterPositionEnum.PREV:
CurrentWordList = CurrentWordList.GetRange(middleIndex, (CurrentWordList.Count - middleIndex));
break;
case LetterPositionEnum.MATCH:
CurrentWordList = CurrentWordList.GetRange(middleIndex, (CurrentWordList.Count - middleIndex));
break;
case LetterPositionEnum.NEXT:
CurrentWordList = CurrentWordList.GetRange(0, middleIndex);
break;
case LetterPositionEnum.After:
CurrentWordList = CurrentWordList.GetRange(0, middleIndex);
break;
default:
break;
}
}
TrimWords(currChar, currIndex);
//Null check.
if (CurrentWordList == null || CurrentWordList.Count < 1)
{
return false;
}
//There are still words left in CurrentWordList.
return true;
}
//Trim all words in CurrentWordList
//that are LetterPositionEnum.PREV and LetterPositionEnum.NEXT
private static void TrimWords(char currChar, int currIndex)
{
int startIndex = 0;
int endIndex = CurrentWordList.Count;
bool startIndexFound = false;
//Loop through all of the words.
for ( int i = startIndex; i < endIndex; i++)
{
//If we havent found the start index then the first match of currChar
//will be the start index.
if( !startIndexFound && currChar == CurrentWordList[i][currIndex] )
{
startIndex = i;
startIndexFound = true;
}
//If we have found the start index then the next letter that isnt
//currChar will be the end index.
if( startIndexFound && currChar != CurrentWordList[i][currIndex])
{
endIndex = i;
break;
}
}
//Trim the words that dont start with currChar.
CurrentWordList = CurrentWordList.GetRange(startIndex, endIndex);
}
//In order to find all words that begin with a given character, we should search
//for the last word that begins with the previous character (PREV) and the
//first word that begins with the next character (NEXT).
//Anything else Before or After that is trash and we will throw out.
public enum LetterPositionEnum
{
Before,
PREV,
MATCH,
NEXT,
After
};
//We want to ignore all letters that come before this one.
public static LetterPositionEnum GetLetterPosition(char currChar, char compareLetter)
{
switch (currChar)
{
case 'A':
switch (compareLetter)
{
case 'A': return LetterPositionEnum.MATCH;
case 'B': return LetterPositionEnum.NEXT;
case 'C': return LetterPositionEnum.After;
case 'D': return LetterPositionEnum.After;
case 'E': return LetterPositionEnum.After;
case 'F': return LetterPositionEnum.After;
case 'G': return LetterPositionEnum.After;
case 'H': return LetterPositionEnum.After;
case 'I': return LetterPositionEnum.After;
case 'J': return LetterPositionEnum.After;
case 'K': return LetterPositionEnum.After;
case 'L': return LetterPositionEnum.After;
case 'M': return LetterPositionEnum.After;
case 'N': return LetterPositionEnum.After;
case 'O': return LetterPositionEnum.After;
case 'P': return LetterPositionEnum.After;
case 'Q': return LetterPositionEnum.After;
case 'R': return LetterPositionEnum.After;
case 'S': return LetterPositionEnum.After;
case 'T': return LetterPositionEnum.After;
case 'U': return LetterPositionEnum.After;
case 'V': return LetterPositionEnum.After;
case 'W': return LetterPositionEnum.After;
case 'X': return LetterPositionEnum.After;
case 'Y': return LetterPositionEnum.After;
case 'Z': return LetterPositionEnum.After;
default: return LetterPositionEnum.After;
}
case 'B':
switch (compareLetter)
{
case 'A': return LetterPositionEnum.PREV;
case 'B': return LetterPositionEnum.MATCH;
case 'C': return LetterPositionEnum.NEXT;
case 'D': return LetterPositionEnum.After;
case 'E': return LetterPositionEnum.After;
case 'F': return LetterPositionEnum.After;
case 'G': return LetterPositionEnum.After;
case 'H': return LetterPositionEnum.After;
case 'I': return LetterPositionEnum.After;
case 'J': return LetterPositionEnum.After;
case 'K': return LetterPositionEnum.After;
case 'L': return LetterPositionEnum.After;
case 'M': return LetterPositionEnum.After;
case 'N': return LetterPositionEnum.After;
case 'O': return LetterPositionEnum.After;
case 'P': return LetterPositionEnum.After;
case 'Q': return LetterPositionEnum.After;
case 'R': return LetterPositionEnum.After;
case 'S': return LetterPositionEnum.After;
case 'T': return LetterPositionEnum.After;
case 'U': return LetterPositionEnum.After;
case 'V': return LetterPositionEnum.After;
case 'W': return LetterPositionEnum.After;
case 'X': return LetterPositionEnum.After;
case 'Y': return LetterPositionEnum.After;
case 'Z': return LetterPositionEnum.After;
default: return LetterPositionEnum.After;
}
case 'C':
switch (compareLetter)
{
case 'A': return LetterPositionEnum.Before;
case 'B': return LetterPositionEnum.PREV;
case 'C': return LetterPositionEnum.MATCH;
case 'D': return LetterPositionEnum.NEXT;
case 'E': return LetterPositionEnum.After;
case 'F': return LetterPositionEnum.After;
case 'G': return LetterPositionEnum.After;
case 'H': return LetterPositionEnum.After;
case 'I': return LetterPositionEnum.After;
case 'J': return LetterPositionEnum.After;
case 'K': return LetterPositionEnum.After;
case 'L': return LetterPositionEnum.After;
case 'M': return LetterPositionEnum.After;
case 'N': return LetterPositionEnum.After;
case 'O': return LetterPositionEnum.After;
case 'P': return LetterPositionEnum.After;
case 'Q': return LetterPositionEnum.After;
case 'R': return LetterPositionEnum.After;
case 'S': return LetterPositionEnum.After;
case 'T': return LetterPositionEnum.After;
case 'U': return LetterPositionEnum.After;
case 'V': return LetterPositionEnum.After;
case 'W': return LetterPositionEnum.After;
case 'X': return LetterPositionEnum.After;
case 'Y': return LetterPositionEnum.After;
case 'Z': return LetterPositionEnum.After;
default: return LetterPositionEnum.After;
}
//etc. Stack Overflow limits characters to 30,000 contact me for full switch case.
default: return LetterPositionEnum.After;
}
}
}
Ok here is the final solution I came up with, I am not sure if it is Optimal Optimal, but seems to be pretty darn fast and I like the logic and love the brevity of code.
Basically on App start up you pass in a List of words of any length to InitWords. This will sort the words and place them into a Dicitonary that has 26 keys, one for each Letter in the alphabet.
Then during play, you will iterate through the character set, always starting with the first letter and then the first and second letter and so on. The whole time you are trimming down the number of words in your CurrentWordList.
So if you have the string 'icedgt'. You would call InitCompare with 'i', this would grab the KeyValuePair with Key 'I' from MyDictionary, then you will see if the first word is of length 1 since they are already in alphabetic order, the word 'I' would be the first word. Then on your next iteration you pass in 'c' to NextCompare, this again reduces the List size by using Linq to only return words that have a second char of 'c'. Then next you would do another NextCompare and pass in 'e', again reducing the number of words in CurrentWordList using Linq.
So after the first iteration your CurrentWordList has every word that starts with 'i', on the NextCompare you will have every word that starts with 'ic' and on the NextCompare you will have a subset of that where every word starts with 'ice' and so on.
I am not sure if Linq would have beat my manual gigantic Switch Case in terms of speed, but it is simple and elegant. And for that I am happy.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Xuzzle.Code
{
public class CompareHelper
{
//Should always be sorted in alphabetical order.
public static Dictionary<char, List<string>> MyDictionary;
public static List<string> CurrentWordList;
//The word we are trying to find matches for.
public static char InitChar;
public static StringBuilder ThisWord;
/// <summary>
/// Init MyDictionary with the list of words passed in. Make a new
/// key value pair with each Letter.
/// </summary>
/// <param name="listOfWords"></param>
public static void InitWords(List<string> listOfWords)
{
MyDictionary = new Dictionary<char, List<string>>();
foreach (char currChar in LetterHelper.Alphabet)
{
var wordsParsed = listOfWords.Where(currWord => char.ToUpper(currWord[0]) == currChar).ToArray();
Array.Sort(wordsParsed);
MyDictionary.Add(currChar, wordsParsed.ToList());
}
}
/// <summary>
/// Initialize the Compare. Set the first character. See if there are any 1 letter words
/// for that character.
/// </summary>
/// <param name="firstChar">The first character in the word string.</param>
/// <returns>True if a word was found.</returns>
public static bool InitCompare(char firstChar)
{
InitChar = firstChar;
//Get all words that start with the firstChar.
CurrentWordList = MyDictionary[InitChar];
ThisWord = new StringBuilder();
ThisWord.Append(firstChar);
if (CurrentWordList[0].Length == 1)
{
//Match.
return true;
}
//No matches.
return false;
}
/// <summary>
/// Append this letter to our ThisWord. See if there are any matching words.
/// </summary>
/// <param name="nextChar">The next character in the word string.</param>
/// <returns>True if a word was found.</returns>
public static bool NextCompare(char nextChar)
{
ThisWord.Append(nextChar);
int currentIndex = ThisWord.Length - 1;
if (CurrentWordList != null && CurrentWordList.Count > 0)
{
CurrentWordList = CurrentWordList.Where(word => (word.Length > currentIndex && word[currentIndex] == nextChar)).ToList();
if (CurrentWordList != null && CurrentWordList.Count > 0)
{
if (CurrentWordList[0].Length == ThisWord.Length)
{
//Match.
return true;
}
}
}
//No matches.
return false;
}
}
}

C# Convert Alphanumeric phone number

I've been working on this issue for awhile and I've been stuck so I hope someone can push me in the right direction. I have a c# console application that will take in a string and verify that it contains only 0-9, a-z, A-Z, and -.
My issue that I'm having is that I need to convert any letters in the phone number to their respective number. So if I input 1800-Flowers, it will output as 1800-3569377. I have my methods defined:
I'm not looking for the solutions here (this is homework), but I'm looking for a push in the right direction. Do I need to convert the string to a char array to break up each individual character, and then use that in the convert method to switch any letter into a number?
There are certainly a lot of solutions here. Since you're already using Regex, you could approach it in a basic way:
num = Regex.Replace(num, #"[abcABC]", "2");
num = Regex.Replace(num, #"[defDEF]", "3");
//....
or you could create a Dictionary<string,char> and run through each char and convert it to the mapped character. Something like :
var dict = new Dictionary<string, char>();
dict.Add("abcABC",'2');
//...
foreach(char c in num.Where(char.IsLetter))
{
var digit = dict.First(d => d.Key.Contains(c)).Value;
num = num.Replace(c, digit);
}
Like you said, the LINQ here is splitting the string to a char array, and looping through ones that are letters
Since this is for school, i'm sure you can't go crazy with more advanced topics. Lets keep it simple with a switch/case.
You can map the letters to their corresponding number first, just use a switch/case to find the correct number depending on the letter.
For example:
String phoneNumber = "1800ab";
for(int x=0; x < phoneNumber.Length; x++)
{
if(Char.IsLetter(phoneNumber[x]))
{
switch(phoneNumber[x].ToString().ToLower())
{
case "a":
case "b":
case "c":
//This is number 2!
break;
}
}
}
String already implements IEnumerable<char> - so no need to "break up" there.
Mapping of something to something (like letter code to matching number) is generally done with map (associative array) types (in C#/.Net it is Dictionary) that provide mapping one value ("key") to corresponding "value" - consider using that.
string letter1 = AskuserforInput("first letter");
string number1 = SwitchMethod(letter1);
string letter2 = AskuserforInput("second letter");
string number2 = SwitchMethod(letter2);
string letter3 = AskuserforInput("third letter");
string number3 = SwitchMethod(letter3);
string letter4 = AskuserforInput("fouth letter");
string number4 = SwitchMethod(letter4);
string letter5 = AskuserforInput("fifth letter");
string number5 = SwitchMethod(letter5);
string letter6 = AskuserforInput("sixth letter");
string number6 = SwitchMethod(letter6);
string letter7 = AskuserforInput("seventh letter");
string number7 = SwitchMethod(letter7);
string letter8 = AskuserforInput("eigth letter");
string number8 = SwitchMethod(letter8);
string letter9 = AskuserforInput("ninth letter");
string number9 = SwitchMethod(letter9);
string letter10 = AskuserforInput("tenth letter");
string number10 = SwitchMethod(letter10);
//declaring strings
Console.WriteLine("This is the original letter phone digits");
Console.WriteLine("({0}{1}{2})) {3}{4}{5} - {6}{7}{8}{9} ", letter1,letter2, letter3, letter4, letter5, letter6, letter7, letter8, letter9, letter10);//continue this
Console.WriteLine("The actual numbers" );
Console.WriteLine("({0}{1}{2})) {3}{4}{5} - {6}{7}{8}{9} ", number1, number2, number3, number4, number5, number6, number7, number8, number9, number10);//continue this
Console.Read();
#region End Program
//wait for program to acknowledge results
Console.BackgroundColor = ConsoleColor.White;
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine("\n\nPlease hit ENTER to end program. . .");
Console.Read();
#endregion
Console.Read();
//also pulled this back up from a previous program
}
public static string SwitchMethod(string x)
{
string y = "*";
switch (x)
{
case "0":
y = "0";
break;
case "1":
y = "1";
break;
case "A":
case "a":
case "B":
case "b":
case "C":
case "c":
case "2":
y = "2";
break;
case "D":
case "d":
case "E":
case "e":
case "F":
case "f":
case "3":
y = "3";
break;
case "G":
case "g":
case "H":
case "h":
case "I":
case "i":
case "4":
y = "4";
break;
case "J":
case "j":
case "K":
case "k":
case "L":
case "l":
case "5":
y = "5";
break;
case "M":
case "m":
case "N":
case "n":
case "O":
case "o":
case "6":
y = "6";
break;
case "P":
case "p":
case "Q":
case "q":
case "R":
case "r":
case "S":
case "s":
case "7":
y = "7";
break;
case "T":
case "t":
case "U":
case "u":
case "V":
case "v":
case "8":
y = "8";
break;
case "W":
case "w":
case "X":
case "x":
case "Y":
case "y":
case "Z":
case "z":
case "9":
y ="9";
break;
default:
Console.WriteLine("knucklehead, not a letter");
Console.WriteLine("an '*' will show up");
break;
//used cases, next will use to.lower
//Lynch helped
}
return y;
}
public static string AskuserforInput(string x)
{
Console.WriteLine("\nPlease type {0}", x);
String input = Console.ReadLine();
return input;
}
I'm sure someone can think of a better way, but you could loop through each digit and pass it to this function:
int Asc(char ch)
{
//Return the character value of the given character
return (int)Encoding.ASCII.GetBytes(ch)[0];
}
Then just assign a number based on which ASCII character is returned.

C# switch case how to calculate

Hey I´am trying to add two numbers together via switch case.
I have 3 inputs, number 1 and number 2. The 3rd input is the method which I´d like to calculate number 1 and 2 with(e.g +, -, *, /, etc...)
Now the problem is how do I create something like "this" ? I´ve tried this way, but it does not work...
Is it possible to make switch case like this : case %: ???
Thanks
string firstNumber;
string secondNumber;
string method;
//get numbers
Console.WriteLine ("Get first number");
firstNumber = Console.ReadLine ();
Console.WriteLine ("get 2nd number");
secondNumber = Console.ReadLine ();
Console.WriteLine ("the method to calculate with");
Console.WriteLine (" 1:\"*\"");
Console.WriteLine (" 2:\"/\"");
Console.WriteLine (" 3:\"+\"");
Console.WriteLine (" 4:\"-\"");
method = Console.ReadLine ();
//convert
int methodNew = Convert.ToInt32 (method);
int firstNumberNew = Convert.ToInt32 (firstNumber);
int secondNumberNew = Convert.ToInt32 (secondNumber);
switch (methodNew) {
case 1:
firstNumberNew *= secondNumberNew;
break;
default:
Console.WriteLine ("check the methods.");
break;
}
Console.WriteLine (methodNew);
Of course, you can read in a char and do a switch-case
for it:
int c = Console.Read(); // read from console
switch(c) {
case '/':
// work
break;
case '%':
// work
break;
case '*':
// work
break;
case '+':
// work
break;
}
First get the operator then use switch like this:
char method = Console.ReadKey().KeyChar;
double result = 0.0;
switch (method)
{
case '+':
result = firstNumberNew + secondNumberNew;
break;
case '-':
result = firstNumberNew - secondNumberNew;
break;
case '/':
result = firstNumberNew / secondNumberNew;
break;
case '%':
result = firstNumberNew % secondNumberNew;
break;
default:
Console.WriteLine("Invalid value, try again");
break;
}
Console.WriteLine(result);
It would be better if you change the type of firstNumberNew and secondNumberNew to double.Alşo you can use a while loop to force user to enter a valid character.

complex string to double conversion

I have strings in a XML file that ment to be doubles (or float) such as:
<VIPair>
<voltage>+100mV</voltage>
<current>+1.05pA</current>
</VIPair>
<VIPair>
<voltage>+5.00mV</voltage>
<current>+0.0035nA</current>
</VIPair>
The first pair will be "0.1" Volt and "0.00000000000105" Ampere.
The second pair would be "0.005" Volt and "0.000000000035" Ampere.
How can I convert them to double of float in C#?
Thanks.
P.S: I already can read them from xml file and at the moment I retrive them as string.
Try with this:
// Read string (if you do not want to use xml)
string test = "<voltage>+100mV</voltage>";
string measure = test.Substring(test.IndexOf('>')+1);
measure = measure.Substring(0, measure.IndexOf('<')-1);
// Extract measure unit
string um = measure.Substring(measure.Length - 1);
measure = measure.Substring(0, measure.Length - 1);
// Get value
double val = Double.Parse(measure);
// Convert value according to measure unit
switch (um)
{
case "G": val *= 1E9; break;
case "M": val *= 1E6; break;
case "k": val *= 1E3; break;
case "m": val /= 1E3; break;
case "u": val /= 1E6; break;
case "n": val /= 1E9; break;
case "p": val /= 1E12; break;
}
Hi here is another version of what Marco has written.
string str = "1pV";
double factor;
double value;
switch (str[str.Length-2])
{
case 'M': factor = 1E6; break;
case 'm': factor = 1E-3; break;
case 'n': factor = 1E-9; break;
case 'p': factor = 1E-12; break;
default:
factor = 1; break;
}
value = double.Parse(str.Substring(0,str.Length-2)) * factor;
Assuming that the html text is already available to you. I have tried to do the same thing with one substring, switch case with characters instead of strings(This is a bit faster to comparing strings) and a double.parse. Hope someone comes up with a better version than this.
Remove the suffix string mV and nA using string.Substring or string.Remove() method and use double.TryParse() method to parse string to double.
If your values always have 2 chars on the end you could simple remove these and parse the number.
var newstring = fullstring.Substring(0, fullstring.Length - 2);
var number = double.Parse(newstring);

Best approach to get ipv4 last octet

I know substring can handle this but, is there a better way to get last octet from an IP ?
Ex.:
192.168.1.100
I want 100
Tks
just for fun:
Console.WriteLine(IPAddress.Parse("192.168.1.33").GetAddressBytes()[3]);
Just for fun I wrote the version which would have the least overhead (string manipulation etc.). #rushui has the correct answer though.
static void Main(string[] args)
{
Console.WriteLine(OctetInIP("10.1.1.100", 0));
Console.ReadLine();
}
static byte OctetInIP(string ip, int octet)
{
var octCount = 0;
var result = 0;
// Loop through each character.
for (var i = 0; i < ip.Length; i++)
{
var c = ip[i];
// If we hit a full stop.
if (c == '.')
{
// Return the value if we are on the correct octet.
if (octCount == octet)
return (byte)result;
octCount++;
}
else if (octCount == octet)
{
// Convert the current octet to a number.
result *= 10;
switch (c)
{
case '0': break;
case '1': result += 1; break;
case '2': result += 2; break;
case '3': result += 3; break;
case '4': result += 4; break;
case '5': result += 5; break;
case '6': result += 6; break;
case '7': result += 7; break;
case '8': result += 8; break;
case '9': result += 9; break;
default:
throw new FormatException();
}
if (result > 255)
throw new FormatException();
}
}
if (octCount != octet)
throw new FormatException();
return (byte)result;
}
It may be overkill, but a simple regex would also do the trick:
(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})
Remember what an IP address is, it is a 32-bit (4 byte) number. So masking the address with the subnet mask would actually be the correct way to do it. If you always want a subnet mask of 255.255.255.0, as your question implies, you can & the number with 0xFF to get the number.
But, if you don't care about efficiency, and only have the address as a string, the split on "." is just fine... :)

Categories

Resources