Regex match with dictionary key

Regex match with dictionary key - c#

I'm expanding upon my previously asked question.
Before I explain, here is my C# code:
static Dictionary<string, string> phenom = new Dictionary<string, string>
{
{"-", "Light"},
{"\\+", "Heavy"},
{"\bVC\b","In the Vicinity"},
// descriptor
{"MI","Shallow"},
{"PR","Partial"},
{"BC","Patches"},
{"DR","Low Drifting"},
{"BL","Blowing"},
{"SH","Showers"},
{"TS","Thunderstorm"},
{"FZ","Freezing"},
// precipitation
{"DZ","Drizzle"},
{"RA","Rain"},
{"SN","Snow"},
{"SG","Snow Grains"},
{"IC","Ice Crystals"},
{"PL","Ice Pellets"},
{"GR","Hail"},
{"GS","Small Hail/Snow Pellets"},
{"UP","Uknown Precipitation"},
// obscuration
{"BR","Mist"},
{"FG","Fog"},
{"FU","Smoke"},
{"VA","Volcanic Ash"},
{"DU","Widespread Dust"},
{"SA","Sand"},
{"HZ","Haze"},
{"PY","Spray"},
// other
{"PO","Well-Developed Dust/Sand Whirls"},
{"SQ","Squalls"},
{"FC","Funnel Cloud Tornado Waterspout"},
{"SS","Sandstorm"},
{"DS","Duststorm"}
};
public static string Process(String metar)
{
metar = Regex.Replace(metar, "(?<=A[0-9]{4}).*", "");
StringBuilder _string = new StringBuilder();
var results = from result in phenom
where Regex.Match(metar, result.Key, RegexOptions.Singleline).Success
select result;
foreach (var result in results)
{
if (result.Key == "DZ" || result.Key == "RA" || result.Key == "SN" || result.Key == "SG" || result.Key == "PL")
{
switch (result.Key)
{
case "+":
_string.Append("Heavy ");
break;
case "-":
_string.Append("Light");
break;
case "VC":
_string.Append("In the Vicinity ");
break;
default:
break;
}
}
_string.AppendFormat("{0} ", result.Value);
}
return _string.ToString();
}
Basically, this code parses the weather phenomenons in an airport METAR weather report. Take the following METAR for example:
KAMA 230623Z AUTO 05016KT 7SM +TSRAGR FEW070 BKN090 OVC110 16/14 A3001
RMK AO2 PK WND 06026/0601 LTG DSNT E-SW RAB04 TSB02E17 P0005 T01560144
Notice the +TSRAGR...this is the portion I want to focus on. The code I currently have works fine...but not close to what I actually want. It spits out this: "Heavy Thunderstorm Rain Hail".
The actual decoding differs slightly. Referenced from this manual (see the 5th page).
The intensity indicators (+ and -) always come before the first precipitation. In the above METAR, it would be RA. So, ideally it should spit out "Thunderstorm Heavy Rain Hail", instead. However, it is not. There are other times when the phenomena is not as intense...sometimes it may only be -RA, in which case should only return "Light Rain".
Also to note, as the manual I referenced says, the intensity identifiers are not coded with the IC, GR, GS or UP precipitation types, which explains my attempt at checking the key value before appending the intensity, but failed.
If someone could point me in the right direction on how to append the intensity identifiers in the right location, I'd much appreciate it.
TL;DR...basically, I some code logic to decide where to put prefixes before specific items from a dictionary list.

Given the structure of your code, I would recommend pre-parsing your input string to "move" the intensity indicators to after the terms they don't modify, so the words come out in the order that makes the most sense.
Then, I might consider adding a few keys to your dictionary to cover the +/- versions of each type of precipitation that can have an intensity indicator, so your overall code is simplified.

Related

Parsing CSV data using a finite state machine

I want to read a file containing comma-separated values, so have written a finite state machine:
private IList<string> Split(string line)
{
List<string> values = new List<string>();
string value = string.Empty;
ParseState state = ParseState.Initial;
foreach (char c in line)
{
switch (state)
{
case ParseState.Initial:
switch (c)
{
case COMMA:
values.Add(string.Empty);
break;
case QUOTE:
state = ParseState.Quote;
break;
default:
value += c;
state = ParseState.Data;
break;
}
break;
case ParseState.Data:
switch (c)
{
case COMMA:
values.Add(value);
value = string.Empty;
state = ParseState.Initial;
break;
case QUOTE:
throw new InvalidDataException("Improper quotes");
default:
value += c;
break;
}
break;
case ParseState.Quote:
switch (c)
{
case QUOTE:
state = ParseState.QuoteInQuote;
break;
default:
value += c;
break;
}
break;
case ParseState.QuoteInQuote:
switch (c)
{
case COMMA:
values.Add(value);
value = string.Empty;
state = ParseState.Initial;
break;
case QUOTE:
value += c;
state = ParseState.Quote;
break;
default:
throw new InvalidDataException("Unpaired quotes");
}
break;
}
}
switch (state)
{
case ParseState.Initial:
case ParseState.Data:
case ParseState.QuoteInQuote:
values.Add(value);
break;
case ParseState.Quote:
throw new InvalidDataException("Unclosed quotes");
}
return values;
}
Yes, I know the advice about CSV parsers is "don't write your own", but
I needed it quickly and
our download policy at work would take several days to allow me to
get open source off the 'net.
Hey, at least I didn't start with string.Split() or, worse, try using a Regex!
And yes, I know it could be improved by using a StringBuilder, and it's restrictive on quotes in the data, but
performance is not an issue and
this is only to generate well-defined test data in-house,
so I don't care about those.
What I do care about is the apparent trailing block at the end for mopping up all the data after the final comma, and the way that it's starting to look like some sort of an anti-pattern down there, which was exactly the sort of thing that "good" patterns such as a FSM were supposed to avoid.
So my question is this: is this block at the end some sort of anti-pattern, and is it something that's going to come back to bite me in the future?

All of the FSMs I've ever seen (not that I go hunting for them, mind you) all have some kind of "mopping up" step, simply due to the nature of enumeration.
In an FSM you're always acting upon the current state, and then resetting the 'current state' for the next iteration, so once you've hit the end of your iterations you have to do one last operation to act upon the 'current state'. (Might be better to think about it as acting upon the 'previous state' and setting the 'current state').
Therefore, I would consider that what you've done is part of the pattern.
But why didn't you try some of the other answers on SO?
Split CSV String (specifically this answer)
How to properly split a CSV using C# split() function? (specifically this answer)
Adapted solution, still an FSM:
public IEnumerable<string> fsm(string s)
{
int i, a = 0, l = s.Length;
var q = true;
for (i = 0; i < l; i++)
{
switch (s[i])
{
case ',':
if (q)
{
yield return s.Substring(a, i - a).Trim();
a = i + 1;
}
break;
// pick your flavor
case '"':
//case '\'':
q = !q;
break;
}
}
yield return s.Substring(a).Trim();
}
// === usage ===
var row = fsm(csvLine).ToList();
foreach(var column in fsm(csvLine)) { ... }

In a FSM you identify which states are the permitted halting states. So in a typical implementation, when you come out of the loop you need to at least check to make sure that your last state is one of the permitting halting states or throw a jam error. So having that one last state check outside of the loop is part of the pattern.

The source of the problem, if you want to call it that, is the absence of an end-of-line marker in your input data. Add a newline character, for example, at the end of your input string and you will be able to get rid of the "trailing block" that seems to annoy you so much.
As far as I'm concerned, your code is correct and, no, there is no reason why this implementation will come back to bite you in the future!

I had a similiar issue, but i was parsing a text file character by character. I didnt like this big clean-up-switch-block after the while loop. To solve this, I made a wrapper for the streamreader. The wrapper checked when streamreader had no characters left. In this case, the wrapper would return an EOT-ascii character once (EOT is equal to EOF). This way my state machine could react to the EOF depending on the state it was in at that moment.

Y/N or y/n in loop

I have trouble implementing the Y/N or y/n in the loop. I've designed it in a way that a user can use both the capital and small letters of the Y and N for their answer in a loop. by the way here's my code but can't seem to make it work:
do
{
Console.WriteLine("\nSelect additional topping/s\n");
Console.WriteLine("1 - Extra meat: 200");
Console.WriteLine("2 - Extra cheese: 100");
Console.WriteLine("3 - Extra veggies: 80\n");
int selectedTopping = Convert.ToInt32(Console.ReadLine());
switch (selectedTopping)
{
case 1:
pizza = new MeatToppings(pizza);
break;
case 2:
pizza = new CheeseToppings(pizza);
break;
case 3:
pizza = new VeggieToppings(pizza);
break;
default:
break;
}
Console.WriteLine("\nAdd more toppings? Y/N");
}
while ((Console.ReadLine() == "Y") || (Console.ReadLine() == "y"));

while ((Console.ReadLine() == "Y") || (Console.ReadLine() == "y"));
This is going to read 2 different lines since you're calling ReadLine() twice. You need to call it once and save the value.

You can use ToUpper
while ((Console.ReadLine().ToUpper() == "Y") );

Try to use String.Equals and StringComparison:
String.Equals(Console.ReadLine(), "y", StringComparison.CurrentCultureIgnoreCase);
from MSDN:
CurrentCultureIgnoreCase: Compare strings using culture-sensitive sort rules, the current culture, and ignoring the case of the strings being compared.
OrdinalIgnoreCase: Compare strings using ordinal sort rules and ignoring the case of the strings being compared.

To check Y or y ignoring case, you should use string.Equals(string,StringComparison) overload.
while (Console.ReadLine().Equals("Y", StringComparison.InvariantCultureIgnoreCase));
Please see the The Turkish İ Problem and Why You Should Care before using ToUpper or ToLower for string comparison with ignore case.
Your current code is reading the lines from console twice, that is why your code is holding up for the 2nd value.

As Austin just pointed out, you are using ReadLine twice in the while loop statement.
One thing worth mentioning is try to follow the rule of modularity, it will help speed up implementing and debugging our code.
It's been a while since I did any C# programming so sudo-coding this in Java style.
Since it's a command line programming you are probably have to validate user input more than once. One thing I would do is make a utility class to contains common user input tasks.
public class TerminalUtil {
private TerminalUtil() {}
public static boolean isYes(String msg){ return (msg.ToUpper() == "Y" || msg.ToUpper() == "YES"); }
public static boolean isNo(String msg){ return (msg.ToUpper() == "N" || msg.ToUpper() == "NO"); }
// You also might want basic conditionals to check if string is numeric or contains letters.
// I like using recursion for command line utilities so having a method that can re-print messages is handy
public static void display(String[] messages){
for(String msg : messages){
Console.WriteLine(msg);
}
}
public static boolean enterYesOrNo(String[] messages, String[] errorMessages){
display(messages)
String input = Console.ReadLine();
if( isYes(input) ){
return true;
} else if( isNo(input) ){
return false;
} else {
display(errorMessages); // Maybe something like, you didn't enter a yes or no value.
enterYesOrNo(messages, errorMessages); // Recursive loop to try again.
}
}
}
Here is what the code to order a pizza might look like
public class OrderPizza{
public static int selectToppings(){
String[] message = new String[4];
message[0] = ("\nSelect additional topping/s\n");
message[1] = ("1 - Extra meat: 200");
message[2] = ("2 - Extra cheese: 100");
message[3] = ("3 - Extra veggies: 80\n");
int option = TerminalUtils.entryNumeric(message, {"You entered an non-numeric character, try again"} );
if( option > 0 && option <= 3 ){
return option;
} else {
Console.WriteLine("Number must be between 1 - 3, try again.");
return selectToppings();
}
}
public static Pizza order(){
Pizza pizza = new Pizza();
while(true){
int toppingCode = selectTopping();
pizza.addTopping(toppingCode);
if(!TerminalUtil.enterYesOrNo({"\nAdd more toppings? Y/N"}, {"Please enter a 'Y'es or 'N'o"}) ){
break;
}
}
}
}
The main benefit of this is that the business logic of the while loop has been reduced and you can reuse the code in TerminalUtils. Also this is by no mean a elegant solution, I'm lazy and it's 3am IRL, but it's should be enough to the ball rolling.
One thing you should probably reconsider doing is using integer codes to represent toppings. Using an enum might make things easier to implement.
I also notice that you add three different types of pizza's, which I'm assuming three separate objects.
Since you are looping to add toppings to a pizza, make an abstract class of pizza. This way you could extend it generic pre-built pizzas such as pepperoni or cheese and use the abstract pizza class if you want the customer to customize their order.

I didn't found better way than:
while ( str!="N" )
{
str = Console.ReadLine();
str = str.ToUpper();
if (str == "Y");
break;
};

Parsing custom data tokens and replacing with values in C#

I have about 10 pieces of data from a record and I want to be able to define the layout of a string where this data is returned, with the option of leaving some pieces out. My thought was to use an enum to give integer values to my tokens/fields and then have a format like {0}{1}{2}{3} or something as complicated as {4} - {3}{1} [{8}]. The meaning of the tokens relates to fields in my database. For instance I have this enum for my tokens relating to payments made.
AccountMask = 0,
AccountLast4 = 1,
AccountFirstDigit = 2,
AccountFirstLetter = 3,
ItemNumber = 4,
Amount = 5
Account mask is a string like VXXXXX1234 where the V is for a visa, and 1234 are the last 4 digits of the card. Sometimes clients wants the V, sometimes they want the first digit (It's easy to translate a card type into a first digit).
My goal is to create something reusable to generate a string using tokens in a format string that will then use the data associated with the digit inside the token to do an in place replace of the data.
So, for an example using the mask above and my enum if I wanted to define a format 9{2}{1}{4:[0:0000000000]}
if the item number is 678934
which would then translate to 9412340000678934 where the inner part of token 4 becomes a definition for a String.Format of that value. Also, the data placed around the tokens is ignored and kept in place.
My issue comes to the manipulation of the string and the best practice. I have been told that regular expressions can be costly if you are going to create them on the fly. As a CS major, I have a feeling the "right" (however complex) solution is to make a lexer/parser for my tokens. I have no experience writing a lexer/parse in C# so I'm not sure of the best practices around it. I'm looking for guidance here on a system that is efficient and easy to tweak.

I see this problem, and I immediately thought that the solution is pretty simple; store the masks you wish to use, with constant values for various data fields you wish to include, and pass the masks into a constant call to String.Format():
const string CreditCardWithFirstLetterMask = "{3}XXXXXXXXXXX{1}";
const string CreditCardWithFirstDigitMask = "{2}XXXXXXXXXXX{1}";
...
var formattedString = String.Format(CreditCardWithFirstDigitMask,
record.AccountMask,
record.AccountLast4,
record.AccountFirstDigit,
record.AccountFirstLetter,
record.ItemNumber,
record.Amount);

I ended up putting the regex as a static object in the class and then looping through matches to perform replacements and build out my token.
var token = request.TokenFormat;
var matches = tokenExpression.Matches(request.TokenFormat);
foreach (Match match in matches)
{
var value = match.Value;
var tokenCode = (Token)Convert.ToInt32(value.Substring(1, (value.Contains(":") ? value.IndexOf(":") : value.IndexOf("}")) - 1));
object data = null;
switch (tokenCode)
{
case Token.AccountMask:
data = accountMask;
break;
case Token.AccountLast4:
data = accountMask.Substring(accountMask.Length - 4);
break;
case Token.AccountFirstDigit:
string firstLetter = accountMask.Substring(0, 1);
switch (firstLetter.ToUpper())
{
case "A":
data = 3;
break;
case "V":
data = 4;
break;
case "M":
data = 5;
break;
case "D":
data = 6;
break;
}
break;
case Token.AccountFirstLetter:
data = accountMask.Substring(0, 1);
break;
case Token.ItemNumber:
if(item != null)
data = item.PaymentId;
break;
case Token.Amount:
if (item != null)
data = item.Amount;
break;
case Token.PaymentMethodId:
if (paymentMethod != null)
data = paymentMethod.PaymentMethodId;
break;
}
if (formatExpression.IsMatch(value))
{
Match formatMatch = formatExpression.Match(value);
string format = formatMatch.Value.Replace("[", "{").Replace("]", "}");
token = token.Replace(value, String.Format(format, data));
}
else
{
token = token.Replace(value, String.Format("{0}", data));
}
}
return token;

How to make C# Switch Statement use IgnoreCase

If I have a switch-case statement where the object in the switch is string, is it possible to do an ignoreCase compare?
I have for instance:
string s = "house";
switch (s)
{
case "houSe": s = "window";
}
Will s get the value "window"? How do I override the switch-case statement so it will compare the strings using ignoreCase?

A simpler approach is just lowercasing your string before it goes into the switch statement, and have the cases lower.
Actually, upper is a bit better from a pure extreme nanosecond performance standpoint, but less natural to look at.
E.g.:
string s = "house";
switch (s.ToLower()) {
case "house":
s = "window";
break;
}

Sorry for this new post to an old question, but there is a new option for solving this problem using C# 7 (VS 2017).
C# 7 now offers "pattern matching", and it can be used to address this issue thusly:
string houseName = "house"; // value to be tested, ignoring case
string windowName; // switch block will set value here
switch (true)
{
case bool b when houseName.Equals("MyHouse", StringComparison.InvariantCultureIgnoreCase):
windowName = "MyWindow";
break;
case bool b when houseName.Equals("YourHouse", StringComparison.InvariantCultureIgnoreCase):
windowName = "YourWindow";
break;
case bool b when houseName.Equals("House", StringComparison.InvariantCultureIgnoreCase):
windowName = "Window";
break;
default:
windowName = null;
break;
}
This solution also deals with the issue mentioned in the answer by #Jeffrey L Whitledge that case-insensitive comparison of strings is not the same as comparing two lower-cased strings.
By the way, there was an interesting article in February 2017 in Visual Studio Magazine describing pattern matching and how it can be used in case blocks. Please have a look: Pattern Matching in C# 7.0 Case Blocks
EDIT
In light of #LewisM's answer, it's important to point out that the switch statement has some new, interesting behavior. That is that if your case statement contains a variable declaration, then the value specified in the switch part is copied into the variable declared in the case. In the following example, the value true is copied into the local variable b. Further to that, the variable b is unused, and exists only so that the when clause to the case statement can exist:
switch(true)
{
case bool b when houseName.Equals("X", StringComparison.InvariantCultureIgnoreCase):
windowName = "X-Window";):
break;
}
As #LewisM points out, this can be used to benefit - that benefit being that the thing being compared is actually in the switch statement, as it is with the classical use of the switch statement. Also, the temporary values declared in the case statement can prevent unwanted or inadvertent changes to the original value:
switch(houseName)
{
case string hn when hn.Equals("X", StringComparison.InvariantCultureIgnoreCase):
windowName = "X-Window";
break;
}

As you seem to be aware, lowercasing two strings and comparing them is not the same as doing an ignore-case comparison. There are lots of reasons for this. For example, the Unicode standard allows text with diacritics to be encoded multiple ways. Some characters includes both the base character and the diacritic in a single code point. These characters may also be represented as the base character followed by a combining diacritic character. These two representations are equal for all purposes, and the culture-aware string comparisons in the .NET Framework will correctly identify them as equal, with either the CurrentCulture or the InvariantCulture (with or without IgnoreCase). An ordinal comparison, on the other hand, will incorrectly regard them as unequal.
Unfortunately, switch doesn't do anything but an ordinal comparison. An ordinal comparison is fine for certain kinds of applications, like parsing an ASCII file with rigidly defined codes, but ordinal string comparison is wrong for most other uses.
What I have done in the past to get the correct behavior is just mock up my own switch statement. There are lots of ways to do this. One way would be to create a List<T> of pairs of case strings and delegates. The list can be searched using the proper string comparison. When the match is found then the associated delegate may be invoked.
Another option is to do the obvious chain of if statements. This usually turns out to be not as bad as it sounds, since the structure is very regular.
The great thing about this is that there isn't really any performance penalty in mocking up your own switch functionality when comparing against strings. The system isn't going to make a O(1) jump table the way it can with integers, so it's going to be comparing each string one at a time anyway.
If there are many cases to be compared, and performance is an issue, then the List<T> option described above could be replaced with a sorted dictionary or hash table. Then the performance may potentially match or exceed the switch statement option.
Here is an example of the list of delegates:
delegate void CustomSwitchDestination();
List<KeyValuePair<string, CustomSwitchDestination>> customSwitchList;
CustomSwitchDestination defaultSwitchDestination = new CustomSwitchDestination(NoMatchFound);
void CustomSwitch(string value)
{
foreach (var switchOption in customSwitchList)
if (switchOption.Key.Equals(value, StringComparison.InvariantCultureIgnoreCase))
{
switchOption.Value.Invoke();
return;
}
defaultSwitchDestination.Invoke();
}
Of course, you will probably want to add some standard parameters and possibly a return type to the CustomSwitchDestination delegate. And you'll want to make better names!
If the behavior of each of your cases is not amenable to delegate invocation in this manner, such as if differnt parameters are necessary, then you’re stuck with chained if statments. I’ve also done this a few times.
if (s.Equals("house", StringComparison.InvariantCultureIgnoreCase))
{
s = "window";
}
else if (s.Equals("business", StringComparison.InvariantCultureIgnoreCase))
{
s = "really big window";
}
else if (s.Equals("school", StringComparison.InvariantCultureIgnoreCase))
{
s = "broken window";
}

An extension to the answer by #STLDeveloperA. A new way to do statement evaluation without multiple if statements as of C# 7 is using the pattern matching switch statement, similar to the way #STLDeveloper though this way is switching on the variable being switched
string houseName = "house"; // value to be tested
string s;
switch (houseName)
{
case var name when string.Equals(name, "Bungalow", StringComparison.InvariantCultureIgnoreCase):
s = "Single glazed";
break;
case var name when string.Equals(name, "Church", StringComparison.InvariantCultureIgnoreCase):
s = "Stained glass";
break;
...
default:
s = "No windows (cold or dark)";
break;
}
The visual studio magazine has a nice article on pattern matching case blocks that might be worth a look.

In some cases it might be a good idea to use an enum. So first parse the enum (with ignoreCase flag true) and than have a switch on the enum.
SampleEnum Result;
bool Success = SampleEnum.TryParse(inputText, true, out Result);
if(!Success){
//value was not in the enum values
}else{
switch (Result) {
case SampleEnum.Value1:
break;
case SampleEnum.Value2:
break;
default:
//do default behaviour
break;
}
}

One possible way would be to use an ignore case dictionary with an action delegate.
string s = null;
var dic = new Dictionary<string, Action>(StringComparer.CurrentCultureIgnoreCase)
{
{"house", () => s = "window"},
{"house2", () => s = "window2"}
};
dic["HouSe"]();
// Note that the call doesn't return text, but only populates local variable s.
// If you want to return the actual text, replace Action to Func<string> and values in dictionary to something like () => "window2"

Here's a solution that wraps #Magnus 's solution in a class:
public class SwitchCaseIndependent : IEnumerable<KeyValuePair<string, Action>>
{
private readonly Dictionary<string, Action> _cases = new Dictionary<string, Action>(StringComparer.OrdinalIgnoreCase);
public void Add(string theCase, Action theResult)
{
_cases.Add(theCase, theResult);
}
public Action this[string whichCase]
{
get
{
if (!_cases.ContainsKey(whichCase))
{
throw new ArgumentException($"Error in SwitchCaseIndependent, \"{whichCase}\" is not a valid option");
}
//otherwise
return _cases[whichCase];
}
}
public IEnumerator<KeyValuePair<string, Action>> GetEnumerator()
{
return _cases.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return _cases.GetEnumerator();
}
}
Here's an example of using it in a simple Windows Form's app:
var mySwitch = new SwitchCaseIndependent
{
{"hello", () => MessageBox.Show("hello")},
{"Goodbye", () => MessageBox.Show("Goodbye")},
{"SoLong", () => MessageBox.Show("SoLong")},
};
mySwitch["HELLO"]();
If you use lambdas (like the example), you get closures which will capture your local variables (pretty close to the feeling you get from a switch statement).
Since it uses a Dictionary under the covers, it gets O(1) behavior and doesn't rely on walking through the list of strings. Of course, you need to construct that dictionary, and that probably costs more. If you want to reuse the Switch behavior over and over, you can create and initialize the the SwitchCaseIndependent object once and then use it as many times as you want.
It would probably make sense to add a simple bool ContainsCase(string aCase) method that simply calls the dictionary's ContainsKey method.

I would say that with switch expressions (added in C# 8.0), discard patterns and local functions the approaches suggested by #STLDev and #LewisM can be rewritten in even more clean/shorter way:
string houseName = "house"; // value to be tested
// local method to compare, I prefer to put them at the bottom of the invoking method:
bool Compare(string right) => string.Equals(houseName, right, StringComparison.InvariantCultureIgnoreCase);
var s = houseName switch
{
_ when Compare("Bungalow") => "Single glazed",
_ when Compare("Church") => "Stained glass",
// ...
_ => "No windows (cold or dark)" // default value
};

It should be sufficient to do this:
string s = "houSe";
switch (s.ToLowerInvariant())
{
case "house": s = "window";
break;
}
The switch comparison is thereby culture invariant. As far as I can see this should achieve the same result as the C#7 Pattern-Matching solutions, but more succinctly.

I hope this helps try to convert the whole string into particular case either lower case or Upper case and use the Lowercase string for comparison:
public string ConvertMeasurements(string unitType, string value)
{
switch (unitType.ToLower())
{
case "mmol/l": return (Double.Parse(value) * 0.0555).ToString();
case "mg/dl": return (double.Parse(value) * 18.0182).ToString();
}
}

Using the Case Insensitive Comparison:
Comparing strings while ignoring case.
switch (caseSwitch)
{
case string s when s.Equals("someValue", StringComparison.InvariantCultureIgnoreCase):
// ...
break;
}
for more detail Visit this link: Switch Case When In C# Statement And Expression

Now you can use the switch expression (rewrote the previous example):
return houseName switch
{
_ when houseName.Equals("MyHouse", StringComparison.InvariantCultureIgnoreCase) => "MyWindow",
_ when houseName.Equals("YourHouse", StringComparison.InvariantCultureIgnoreCase) => "YourWindow",
_ when houseName.Equals("House", StringComparison.InvariantCultureIgnoreCase) => "Window",
_ => null
};

How would you make this switch statement as fast as possible?

2009-12-04 UPDATE: For profiling results on a number of the suggestions posted here, see below!
The Question
Consider the following very harmless, very straightforward method, which uses a switch statement to return a defined enum value:
public static MarketDataExchange GetMarketDataExchange(string ActivCode) {
if (ActivCode == null) return MarketDataExchange.NONE;
switch (ActivCode) {
case "": return MarketDataExchange.NBBO;
case "A": return MarketDataExchange.AMEX;
case "B": return MarketDataExchange.BSE;
case "BT": return MarketDataExchange.BATS;
case "C": return MarketDataExchange.NSE;
case "MW": return MarketDataExchange.CHX;
case "N": return MarketDataExchange.NYSE;
case "PA": return MarketDataExchange.ARCA;
case "Q": return MarketDataExchange.NASDAQ;
case "QD": return MarketDataExchange.NASDAQ_ADF;
case "W": return MarketDataExchange.CBOE;
case "X": return MarketDataExchange.PHLX;
case "Y": return MarketDataExchange.DIRECTEDGE;
}
return MarketDataExchange.NONE;
}
My colleague and I batted around a few ideas today about how to actually make this method faster, and we came up with some interesting modifications that did in fact improve its performance rather significantly (proportionally speaking, of course). I'd be interested to know what sorts of optimizations anyone else out there can think up that might not have occurred to us.
Right off the bat, let me just offer a quick disclaimer: this is for fun, and not to fuel the whole "to optimize or not to optimize" debate. That said, if you count yourself among those who dogmatically believe "premature optimization is the root of all evil," just be aware that I work for a high-frequency trading firm, where everything needs to run absolutely as fast as possible--bottleneck or not. So, even though I'm posting this on SO for fun, it isn't just a huge waste of time, either.
One more quick note: I'm interested in two kinds of answers--those that assume every input will be a valid ActivCode (one of the strings in the switch statement above), and those that do not. I am almost certain that making the first assumption allows for further speed improvements; anyway, it did for us. But I know that improvements are possible either way.
The Results
Well, it turns out that the fastest solution so far (that I've tested) came from João Angelo, whose suggestion was actually very simple, yet extremely clever. The solution that my coworker and I had devised (after trying out several approaches, many of which were thought up here as well) came in second place; I was going to post it, but it turns out that Mark Ransom came up with the exact same idea, so just check out his answer!
Since I ran these tests, some other users have posted even newer ideas... I will test them in due time, when I have a few more minutes to spare.
I ran these tests on two different machines: my personal computer at home (a dual-core Athlon with 4 Gb RAM running Windows 7 64-bit) and my development machine at work (a dual-core Athlon with 2 Gb RAM running Windows XP SP3). Obviously, the times were different; however, the relative times, meaning, how each method compared to every other method, were the same. That is to say, the fastest was the fastest on both machines, etc.
Now for the results. (The times I'm posting below are from my home computer.)
But first, for reference--the original switch statement:
1000000 runs: 98.88 ms
Average: 0.09888 microsecond
Fastest optimizations so far:
João Angelo's idea of assigning values to the enums based on the hash codes of the ActivCode strings and then directly casing ActivCode.GetHashCode() to MarketDataExchange:
1000000 runs: 23.64 ms
Average: 0.02364 microsecond
Speed increase: 329.90%
My colleague's and my idea of casting ActivCode[0] to an int and retrieving the appropriate MarketDataExchange from an array initialized on startup (this exact same idea was suggested by Mark Ransom):
1000000 runs: 28.76 ms
Average: 0.02876 microsecond
Speed increase: 253.13%
tster's idea of switching on the output of ActivCode.GetHashCode() instead of ActivCode:
1000000 runs: 34.69 ms
Average: 0.03469 microsecond
Speed increase: 185.04%
The idea, suggested by several users including Auraseer, tster, and kyoryu, of switching on ActivCode[0] instead of ActivCode:
1000000 runs: 36.57 ms
Average: 0.03657 microsecond
Speed increase: 174.66%
Loadmaster's idea of using the fast hash, ActivCode[0] + ActivCode[1]*0x100:
1000000 runs: 39.53 ms
Average: 0.03953 microsecond
Speed increase: 153.53%
Using a hashtable (Dictionary<string, MarketDataExchange>), as suggested by many:
1000000 runs: 88.32 ms
Average: 0.08832 microsecond
Speed increase: 12.36%
Using a binary search:
1000000 runs: 1031 ms
Average: 1.031 microseconds
Speed increase: none (performance worsened)
Let me just say that it has been really cool to see how many different ideas people had on this simple problem. This was very interesting to me, and I'm quite thankful to everyone who has contributed and made a suggestion so far.

Assuming every input will be a valid ActivCode, that you can change the enumeration values and highly coupled to the GetHashCode implementation:
enum MarketDataExchange
{
NONE,
NBBO = 371857150,
AMEX = 372029405,
BSE = 372029408,
BATS = -1850320644,
NSE = 372029407,
CHX = -284236702,
NYSE = 372029412,
ARCA = -734575383,
NASDAQ = 372029421,
NASDAQ_ADF = -1137859911,
CBOE = 372029419,
PHLX = 372029430,
DIRECTEDGE = 372029429
}
public static MarketDataExchange GetMarketDataExchange(string ActivCode)
{
if (ActivCode == null) return MarketDataExchange.NONE;
return (MarketDataExchange)ActivCode.GetHashCode();
}

I'd roll my own fast hash function and use an integer switch statement to avoid string comparisons:
int h = 0;
// Compute fast hash: A[0] + A[1]*0x100
if (ActivCode.Length > 0)
h += (int) ActivCode[0];
if (ActivCode.Length > 1)
h += (int) ActivCode[1] << 8;
// Find a match
switch (h)
{
case 0x0000: return MarketDataExchange.NBBO; // ""
case 0x0041: return MarketDataExchange.AMEX; // "A"
case 0x0042: return MarketDataExchange.BSE; // "B"
case 0x5442: return MarketDataExchange.BATS; // "BT"
case 0x0043: return MarketDataExchange.NSE; // "C"
case 0x574D: return MarketDataExchange.CHX; // "MW"
case 0x004E: return MarketDataExchange.NYSE; // "N"
case 0x4150: return MarketDataExchange.ARCA; // "PA"
case 0x0051: return MarketDataExchange.NASDAQ; // "Q"
case 0x4451: return MarketDataExchange.NASDAQ_ADF; // "QD"
case 0x0057: return MarketDataExchange.CBOE; // "W"
case 0x0058: return MarketDataExchange.PHLX; // "X"
case 0x0059: return MarketDataExchange.DIRECTEDGE; // "Y"
default: return MarketDataExchange.NONE;
}
My tests show that this is about 4.5 times faster than the original code.
If C# had a preprocessor, I'd use a macro to form the case constants.
This technique is faster than using a hash table and certainly faster than using string comparisons. It works for up to four-character strings with 32-bit ints, and up to 8 characters using 64-bit longs.

If you know how often the various codes show up, the more common ones should go at the top of the list, so fewer comparisons are done. But let's assume you don't have that.
Assuming the ActivCode is always valid will of course speed things up. You don't need to test for null or the empty string, and you can leave off one test from the end of the switch. That is, test for everything except Y, and then return DIRECTEDGE if no match is found.
Instead of switching on the whole string, switch on its first letter. For the codes that have more letters, put a second test inside the switch case. Something like this:
switch(ActivCode[0])
{
//etc.
case 'B':
if ( ActivCode.Length == 1 ) return MarketDataExchange.BSE;
else return MarketDataExchange.BATS;
// etc.
}
It would be better if you could go back and change the codes so they are all single characters, because you would then never need more than one test. Better yet would be using the numerical value of the enum, so you can simply cast instead of having to switch/translate in the first place.

I'd use a dictionary for the key value pairs and take advantage of the O(1) lookup time.

Do you have any statistics on which strings are more common? So that those could be checked first?

With a valid input could use
if (ActivCode.Length == 0)
return MarketDataExchange.NBBO;
if (ActivCode.Length == 1)
return (MarketDataExchange) (ActivCode[0]);
return (MarketDataExchange) (ActivCode[0] | ActivCode[1] << 8);

Change the switch to switch on the HashCode() of the strings.

I'd extrapolate tster's reply to "switch over a custom hash function", assuming that the code generator creates a lookup table, or - failing that - building the lookup table myself.
The custom hash function should be simple, e.g.:
(int)ActivCode[0]*2 + ActivCode.Length-1
This would require a table of 51 elements, easily kept in L1 cache, under the following assumptions:
Input data must already be validated
empty string must be handled sepsarately
no two-character-codes start with the same character
adding new cases is hard
the empty string case could be incorporated if you could use an unsafe access to ActivCode[0] yielding the '\0' terminator.

Forgive me if I get something wrong here, I'm extrapolating from my knowledge of C++. For example, if you take ActivCode[0] of an empty string, in C++ you get a character whose value is zero.
Create a two dimensional array which you initialize once; the first dimension is the length of the code, the second is a character value. Populate with the enumeration value you'd like to return. Now your entire function becomes:
public static MarketDataExchange GetMarketDataExchange(string ActivCode) {
return LookupTable[ActivCode.Length][ActivCode[0]];
}
Lucky for you all the two-character codes are unique in the first letter compared to the other two-character codes.

I would put it in dictionary instead of using a switch statement. That being said, it may not make a difference. Or it might. See C# switch statement limitations - why?.

Avoid all string comparisons.
Avoid looking at more than a single character (ever)
Avoid if-else since I want the compiler to be able optimize the best it can
Try to get the result in a single switch jump
code:
public static MarketDataExchange GetMarketDataExchange(string ActivCode) {
if (ActivCode == null) return MarketDataExchange.NONE;
int length = ActivCode.Length;
if (length == 0) return MarketDataExchange.NBBO;
switch (ActivCode[0]) {
case 'A': return MarketDataExchange.AMEX;
case 'B': return (length == 2) ? MarketDataExchange.BATS : MarketDataExchange.BSE;
case 'C': return MarketDataExchange.NSE;
case 'M': return MarketDataExchange.CHX;
case 'N': return MarketDataExchange.NYSE;
case 'P': return MarketDataExchange.ARCA;
case 'Q': return (length == 2) ? MarketDataExchange.NASDAQ_ADF : MarketDataExchange.NASDAQ;
case 'W': return MarketDataExchange.CBOE;
case 'X': return MarketDataExchange.PHLX;
case 'Y': return MarketDataExchange.DIRECTEDGE;
default: return MarketDataExchange.NONE;
}
}

Trade memory for speed by pre-populating an index table to leverage simple pointer arithmetic.
public class Service
{
public static MarketDataExchange GetMarketDataExchange(string ActivCode) {
{
int x = 65, y = 65;
switch(ActivCode.Length)
{
case 1:
x = ActivCode[0];
break;
case 2:
x = ActivCode[0];
y = ActivCode[1];
break;
}
return _table[x, y];
}
static Service()
{
InitTable();
}
public static MarketDataExchange[,] _table =
new MarketDataExchange['Z','Z'];
public static void InitTable()
{
for (int x = 0; x < 'Z'; x++)
for (int y = 0; y < 'Z'; y++)
_table[x, y] = MarketDataExchange.NONE;
SetCell("", MarketDataExchange.NBBO);
SetCell("A", MarketDataExchange.AMEX);
SetCell("B", MarketDataExchange.BSE);
SetCell("BT", MarketDataExchange.BATS);
SetCell("C", MarketDataExchange.NSE);
SetCell("MW", MarketDataExchange.CHX);
SetCell("N", MarketDataExchange.NYSE);
SetCell("PA", MarketDataExchange.ARCA);
SetCell("Q", MarketDataExchange.NASDAQ);
SetCell("QD", MarketDataExchange.NASDAQ_ADF);
SetCell("W", MarketDataExchange.CBOE);
SetCell("X", MarketDataExchange.PHLX);
SetCell("Y", MarketDataExchange.DIRECTEDGE);
}
private static void SetCell(string s, MarketDataExchange exchange)
{
char x = 'A', y = 'A';
switch(s.Length)
{
case 1:
x = s[0];
break;
case 2:
x = s[0];
y = s[1];
break;
}
_table[x, y] = exchange;
}
}
Make the enum byte-based to save a little space.
public enum MarketDataExchange : byte
{
NBBO, AMEX, BSE, BATS, NSE, CHX, NYSE, ARCA,
NASDAQ, NASDAQ_ADF, CBOE, PHLIX, DIRECTEDGE, NONE
}

If the enumeration values are arbitrary you could do this...
public static MarketDataExchange GetValue(string input)
{
switch (input.Length)
{
case 0: return MarketDataExchange.NBBO;
case 1: return (MarketDataExchange)input[0];
case 2: return (MarketDataExchange)(input[0] << 8 | input[1]);
default: return MarketDataExchange.None;
}
}
... if you want to go totally nuts you can also use an unsafe call with pointers as noted by Pavel Minaev ...
The pure cast version above is faster than this unsafe version.
unsafe static MarketDataExchange GetValue(string input)
{
if (input.Length == 1)
return (MarketDataExchange)(input[0]);
fixed (char* buffer = input)
return (MarketDataExchange)(buffer[0] << 8 | buffer[1]);
}
public enum MarketDataExchange
{
NBBO = 0x00, //
AMEX = 0x41, //A
BSE = 0x42, //B
BATS = 0x4254, //BT
NSE = 0x43, //C
CHX = 0x4D57, //MW
NYSE = 0x4E, //N
ARCA = 0x5041, //PA
NASDAQ = 0x51, //Q
NASDAQ_ADF = 0x5144, //QD
CBOE = 0x57, //W
PHLX = 0x58, //X
DIRECTEDGE = 0x59, //Y
None = -1
}

+1 for using a dictionary. Not necessarily for optimization, but it'd be cleaner.
I would probably use constants for the strings as well, though i doubt that'd buy you anything performance wise.

Messy but using a combination of nested ifs and hard coding might just beat the optimiser:-
if (ActivCode < "N") {
// "" to "MW"
if (ActiveCode < "BT") {
// "" to "B"
if (ActiveCode < "B") {
// "" or "A"
if (ActiveCode < "A") {
// must be ""
retrun MarketDataExchange.NBBO;
} else {
// must be "A"
return MarketDataExchange.AMEX;
}
} else {
// must be "B"
return MarketDataExchange.BSE;
}
} else {
// "BT" to "MW"
if (ActiveCode < "MW") {
// "BT" or "C"
if (ActiveCode < "C") {
// must be "BT"
retrun MarketDataExchange.NBBO;
} else {
// must be "C"
return MarketDataExchange.NSE;
}
} else {
// must be "MV"
return MarketDataExchange.CHX;
}
}
} else {
// "N" TO "Y"
if (ActiveCode < "QD") {
// "N" to "Q"
if (ActiveCode < "Q") {
// "N" or "PA"
if (ActiveCode < "PA") {
// must be "N"
retrun MarketDataExchange.NYSE;
} else {
// must be "PA"
return MarketDataExchange.ARCA;
}
} else {
// must be "Q"
return MarketDataExchange.NASDAQ;
}
} else {
// "QD" to "Y"
if (ActiveCode < "X") {
// "QD" or "W"
if (ActiveCode < "W") {
// must be "QD"
retrun MarketDataExchange.NASDAQ_ADF;
} else {
// must be "W"
return MarketDataExchange.CBOE;
}
} else {
// "X" or "Y"
if (ActiveCode < "Y") {
// must be "X"
retrun MarketDataExchange.PHLX;
} else {
// must be "Y"
return MarketDataExchange.DIRECTEDGE;
}
}
}
}
This gets the right function with three or four compares. I wouldnt even think of doing this for real unless your piece of code is expected to run several times a second!
You further otimise it so that only single character compares occurred.
e.g. replace '< "BT" ' with '>= "B" ' -- ever so slightly faster and even less readable!

All your strings are at most 2 chars long, and ASCII, so we can use 1 byte per char.
Furthermore, more likely than not, they also never can have \0 appear in them (.NET string allows for embedded null characters, but many other things don't). With that assumption, we can null-pad all your strings to be exactly 2 bytes each, or an ushort:
"" -> (byte) 0 , (byte) 0 -> (ushort)0x0000
"A" -> (byte)'A', (byte) 0 -> (ushort)0x0041
"B" -> (byte)'B', (byte) 0 -> (ushort)0x0042
"BT" -> (byte)'B', (byte)'T' -> (ushort)0x5442
Now that we have a single integer in a relatively (64K) short range, we can use a lookup table:
MarketDataExchange[] lookup = {
MarketDataExchange.NBBO,
MarketDataExchange.NONE,
MarketDataExchange.NONE,
...
/* at index 0x041 */
MarketDataExchange.AMEX,
MarketDataExchange.BSE,
MarketDataExchange.NSE,
...
};
Now, obtaining the value given a string is:
public static unsafe MarketDataExchange GetMarketDataExchange(string s)
{
// Assume valid input
if (s.Length == 0) return MarketDataExchange.NBBO;
// .NET strings always have '\0' after end of data - abuse that
// to avoid extra checks for 1-char strings. Skip index checks as well.
ushort hash;
fixed (char* data = s)
{
hash = (ushort)data[0] | ((ushort)data[1] << 8);
}
return lookup[hash];
}

Put the cases in a sorted structure with non linear access (like a hash table).
The switch that you have will have a linear time.

You can get a mild speed-up by ordering the codes according to which ones are most used.
But I agree with Cletus: the best speed-up I can think of would be to use a hash map with plenty of room (so that there are no collisions.)

A couple of random thoughts, that may not all be applicable together:
Switch on the first character in the string, rather than the string itself, and do a sub-switch for strings which can contain more than one letter?
A hashtable would certainly guarantee O(1) retrieval, though it might not be faster for smaller numbers of comparisons.
Don't use strings, use enums or something like a flyweight instead. Using strings in this case seems a bit fragile anyway...
And if you really need it to be as fast as possible, why aren't you writing it in assembly? :)

Can we cast the ActivCode to int and then use int in our case statements?

Use the length of the code to create a unique value from that code instead of using GetHashCode() . It turns out there are no collisions if you use the first letter of the code shifted by the length of the code. This reduces the cost to two comparisons, one array index and one shift (on average).
public static MarketDataExchange GetMarketDataExchange(string ActivCode)
{
if (ActivCode == null)
return MarketDataExchange.NONE;
if (ActivCode.Length == 0)
return MarketDataExchange.NBBO;
return (MarketDataExchange)((ActivCode[0] << ActivCode.Length));
}
public enum MarketDataExchange
{
NONE = 0,
NBBO = 1,
AMEX = ('A'<<1),
BSE = ('B'<<1),
BATS = ('B'<<2),
NSE = ('C'<<1),
CHX = ('M'<<2),
NYSE = ('N'<<1),
ARCA = ('P'<<2),
NASDAQ = ('Q'<<1),
NASDAQ_ADF = ('Q'<<2),
CBOE = ('W'<<1),
PHLX = ('X'<<1),
DIRECTEDGE = ('Y'<<1),
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex match with dictionary key - c#

Related

Parsing CSV data using a finite state machine

Y/N or y/n in loop

Parsing custom data tokens and replacing with values in C#

How to make C# Switch Statement use IgnoreCase

How would you make this switch statement as fast as possible?

Categories

Resources