Parsing custom data tokens and replacing with values in C# - c#

I have about 10 pieces of data from a record and I want to be able to define the layout of a string where this data is returned, with the option of leaving some pieces out. My thought was to use an enum to give integer values to my tokens/fields and then have a format like {0}{1}{2}{3} or something as complicated as {4} - {3}{1} [{8}]. The meaning of the tokens relates to fields in my database. For instance I have this enum for my tokens relating to payments made.
AccountMask = 0,
AccountLast4 = 1,
AccountFirstDigit = 2,
AccountFirstLetter = 3,
ItemNumber = 4,
Amount = 5
Account mask is a string like VXXXXX1234 where the V is for a visa, and 1234 are the last 4 digits of the card. Sometimes clients wants the V, sometimes they want the first digit (It's easy to translate a card type into a first digit).
My goal is to create something reusable to generate a string using tokens in a format string that will then use the data associated with the digit inside the token to do an in place replace of the data.
So, for an example using the mask above and my enum if I wanted to define a format 9{2}{1}{4:[0:0000000000]}
if the item number is 678934
which would then translate to 9412340000678934 where the inner part of token 4 becomes a definition for a String.Format of that value. Also, the data placed around the tokens is ignored and kept in place.
My issue comes to the manipulation of the string and the best practice. I have been told that regular expressions can be costly if you are going to create them on the fly. As a CS major, I have a feeling the "right" (however complex) solution is to make a lexer/parser for my tokens. I have no experience writing a lexer/parse in C# so I'm not sure of the best practices around it. I'm looking for guidance here on a system that is efficient and easy to tweak.

I see this problem, and I immediately thought that the solution is pretty simple; store the masks you wish to use, with constant values for various data fields you wish to include, and pass the masks into a constant call to String.Format():
const string CreditCardWithFirstLetterMask = "{3}XXXXXXXXXXX{1}";
const string CreditCardWithFirstDigitMask = "{2}XXXXXXXXXXX{1}";
...
var formattedString = String.Format(CreditCardWithFirstDigitMask,
record.AccountMask,
record.AccountLast4,
record.AccountFirstDigit,
record.AccountFirstLetter,
record.ItemNumber,
record.Amount);

I ended up putting the regex as a static object in the class and then looping through matches to perform replacements and build out my token.
var token = request.TokenFormat;
var matches = tokenExpression.Matches(request.TokenFormat);
foreach (Match match in matches)
{
var value = match.Value;
var tokenCode = (Token)Convert.ToInt32(value.Substring(1, (value.Contains(":") ? value.IndexOf(":") : value.IndexOf("}")) - 1));
object data = null;
switch (tokenCode)
{
case Token.AccountMask:
data = accountMask;
break;
case Token.AccountLast4:
data = accountMask.Substring(accountMask.Length - 4);
break;
case Token.AccountFirstDigit:
string firstLetter = accountMask.Substring(0, 1);
switch (firstLetter.ToUpper())
{
case "A":
data = 3;
break;
case "V":
data = 4;
break;
case "M":
data = 5;
break;
case "D":
data = 6;
break;
}
break;
case Token.AccountFirstLetter:
data = accountMask.Substring(0, 1);
break;
case Token.ItemNumber:
if(item != null)
data = item.PaymentId;
break;
case Token.Amount:
if (item != null)
data = item.Amount;
break;
case Token.PaymentMethodId:
if (paymentMethod != null)
data = paymentMethod.PaymentMethodId;
break;
}
if (formatExpression.IsMatch(value))
{
Match formatMatch = formatExpression.Match(value);
string format = formatMatch.Value.Replace("[", "{").Replace("]", "}");
token = token.Replace(value, String.Format(format, data));
}
else
{
token = token.Replace(value, String.Format("{0}", data));
}
}
return token;

Related

Parsing CSV data using a finite state machine

I want to read a file containing comma-separated values, so have written a finite state machine:
private IList<string> Split(string line)
{
List<string> values = new List<string>();
string value = string.Empty;
ParseState state = ParseState.Initial;
foreach (char c in line)
{
switch (state)
{
case ParseState.Initial:
switch (c)
{
case COMMA:
values.Add(string.Empty);
break;
case QUOTE:
state = ParseState.Quote;
break;
default:
value += c;
state = ParseState.Data;
break;
}
break;
case ParseState.Data:
switch (c)
{
case COMMA:
values.Add(value);
value = string.Empty;
state = ParseState.Initial;
break;
case QUOTE:
throw new InvalidDataException("Improper quotes");
default:
value += c;
break;
}
break;
case ParseState.Quote:
switch (c)
{
case QUOTE:
state = ParseState.QuoteInQuote;
break;
default:
value += c;
break;
}
break;
case ParseState.QuoteInQuote:
switch (c)
{
case COMMA:
values.Add(value);
value = string.Empty;
state = ParseState.Initial;
break;
case QUOTE:
value += c;
state = ParseState.Quote;
break;
default:
throw new InvalidDataException("Unpaired quotes");
}
break;
}
}
switch (state)
{
case ParseState.Initial:
case ParseState.Data:
case ParseState.QuoteInQuote:
values.Add(value);
break;
case ParseState.Quote:
throw new InvalidDataException("Unclosed quotes");
}
return values;
}
Yes, I know the advice about CSV parsers is "don't write your own", but
I needed it quickly and
our download policy at work would take several days to allow me to
get open source off the 'net.
Hey, at least I didn't start with string.Split() or, worse, try using a Regex!
And yes, I know it could be improved by using a StringBuilder, and it's restrictive on quotes in the data, but
performance is not an issue and
this is only to generate well-defined test data in-house,
so I don't care about those.
What I do care about is the apparent trailing block at the end for mopping up all the data after the final comma, and the way that it's starting to look like some sort of an anti-pattern down there, which was exactly the sort of thing that "good" patterns such as a FSM were supposed to avoid.
So my question is this: is this block at the end some sort of anti-pattern, and is it something that's going to come back to bite me in the future?
All of the FSMs I've ever seen (not that I go hunting for them, mind you) all have some kind of "mopping up" step, simply due to the nature of enumeration.
In an FSM you're always acting upon the current state, and then resetting the 'current state' for the next iteration, so once you've hit the end of your iterations you have to do one last operation to act upon the 'current state'. (Might be better to think about it as acting upon the 'previous state' and setting the 'current state').
Therefore, I would consider that what you've done is part of the pattern.
But why didn't you try some of the other answers on SO?
Split CSV String (specifically this answer)
How to properly split a CSV using C# split() function? (specifically this answer)
Adapted solution, still an FSM:
public IEnumerable<string> fsm(string s)
{
int i, a = 0, l = s.Length;
var q = true;
for (i = 0; i < l; i++)
{
switch (s[i])
{
case ',':
if (q)
{
yield return s.Substring(a, i - a).Trim();
a = i + 1;
}
break;
// pick your flavor
case '"':
//case '\'':
q = !q;
break;
}
}
yield return s.Substring(a).Trim();
}
// === usage ===
var row = fsm(csvLine).ToList();
foreach(var column in fsm(csvLine)) { ... }
In a FSM you identify which states are the permitted halting states. So in a typical implementation, when you come out of the loop you need to at least check to make sure that your last state is one of the permitting halting states or throw a jam error. So having that one last state check outside of the loop is part of the pattern.
The source of the problem, if you want to call it that, is the absence of an end-of-line marker in your input data. Add a newline character, for example, at the end of your input string and you will be able to get rid of the "trailing block" that seems to annoy you so much.
As far as I'm concerned, your code is correct and, no, there is no reason why this implementation will come back to bite you in the future!
I had a similiar issue, but i was parsing a text file character by character. I didnt like this big clean-up-switch-block after the while loop. To solve this, I made a wrapper for the streamreader. The wrapper checked when streamreader had no characters left. In this case, the wrapper would return an EOT-ascii character once (EOT is equal to EOF). This way my state machine could react to the EOF depending on the state it was in at that moment.

Regex match with dictionary key

I'm expanding upon my previously asked question.
Before I explain, here is my C# code:
static Dictionary<string, string> phenom = new Dictionary<string, string>
{
{"-", "Light"},
{"\\+", "Heavy"},
{"\bVC\b","In the Vicinity"},
// descriptor
{"MI","Shallow"},
{"PR","Partial"},
{"BC","Patches"},
{"DR","Low Drifting"},
{"BL","Blowing"},
{"SH","Showers"},
{"TS","Thunderstorm"},
{"FZ","Freezing"},
// precipitation
{"DZ","Drizzle"},
{"RA","Rain"},
{"SN","Snow"},
{"SG","Snow Grains"},
{"IC","Ice Crystals"},
{"PL","Ice Pellets"},
{"GR","Hail"},
{"GS","Small Hail/Snow Pellets"},
{"UP","Uknown Precipitation"},
// obscuration
{"BR","Mist"},
{"FG","Fog"},
{"FU","Smoke"},
{"VA","Volcanic Ash"},
{"DU","Widespread Dust"},
{"SA","Sand"},
{"HZ","Haze"},
{"PY","Spray"},
// other
{"PO","Well-Developed Dust/Sand Whirls"},
{"SQ","Squalls"},
{"FC","Funnel Cloud Tornado Waterspout"},
{"SS","Sandstorm"},
{"DS","Duststorm"}
};
public static string Process(String metar)
{
metar = Regex.Replace(metar, "(?<=A[0-9]{4}).*", "");
StringBuilder _string = new StringBuilder();
var results = from result in phenom
where Regex.Match(metar, result.Key, RegexOptions.Singleline).Success
select result;
foreach (var result in results)
{
if (result.Key == "DZ" || result.Key == "RA" || result.Key == "SN" || result.Key == "SG" || result.Key == "PL")
{
switch (result.Key)
{
case "+":
_string.Append("Heavy ");
break;
case "-":
_string.Append("Light");
break;
case "VC":
_string.Append("In the Vicinity ");
break;
default:
break;
}
}
_string.AppendFormat("{0} ", result.Value);
}
return _string.ToString();
}
Basically, this code parses the weather phenomenons in an airport METAR weather report. Take the following METAR for example:
KAMA 230623Z AUTO 05016KT 7SM +TSRAGR FEW070 BKN090 OVC110 16/14 A3001
RMK AO2 PK WND 06026/0601 LTG DSNT E-SW RAB04 TSB02E17 P0005 T01560144
Notice the +TSRAGR...this is the portion I want to focus on. The code I currently have works fine...but not close to what I actually want. It spits out this: "Heavy Thunderstorm Rain Hail".
The actual decoding differs slightly. Referenced from this manual (see the 5th page).
The intensity indicators (+ and -) always come before the first precipitation. In the above METAR, it would be RA. So, ideally it should spit out "Thunderstorm Heavy Rain Hail", instead. However, it is not. There are other times when the phenomena is not as intense...sometimes it may only be -RA, in which case should only return "Light Rain".
Also to note, as the manual I referenced says, the intensity identifiers are not coded with the IC, GR, GS or UP precipitation types, which explains my attempt at checking the key value before appending the intensity, but failed.
If someone could point me in the right direction on how to append the intensity identifiers in the right location, I'd much appreciate it.
TL;DR...basically, I some code logic to decide where to put prefixes before specific items from a dictionary list.
Given the structure of your code, I would recommend pre-parsing your input string to "move" the intensity indicators to after the terms they don't modify, so the words come out in the order that makes the most sense.
Then, I might consider adding a few keys to your dictionary to cover the +/- versions of each type of precipitation that can have an intensity indicator, so your overall code is simplified.

Slimming down a switch statement

Wondering if there are good alternatives to this that perform no worse than what I have below? The real switch statement has additional sections for other non-English characters.
Note that I'd love to put multiple case statements per line, but StyleCop doesn't like it and will fail our release build as a result.
var retVal = String.Empty;
switch(valToCheck)
{
case "é":
case "ê":
case "è":
case "ë":
retVal = "e";
break;
case "à":
case "â":
case "ä":
case "å":
retVal = "a";
break;
default:
retVal = "-";
break;
}
The first thing that comes to mind is a Dictionary<char,char>()
(I prefer char instead of strings because you are dealing with chars)
Dictionary<char,char> dict = new Dictionary<char,char>();
dict.Add('å', 'a');
......
then you could remove your entire switch
char retValue;
char testValue = 'å';
if(dict.TryGetValue(testValue, out retValue) == false)
retVal = '-';
Well, start off by doing this transformation.
public class CharacterSanitizer
{
private static Dictionary<string, string> characterMappings = new Dictionary<string, string>();
static CharacterSanitizer()
{
characterMappings.Add("é", "e");
characterMappings.Add("ê", "e");
//...
}
public static string mapCharacter(string input)
{
string output;
if (characterMappings.TryGetValue(input, out output))
{
return output;
}
else
{
return input;
}
}
}
Now you're in the position where the character mappings are part of the data, rather than the code. I've hard coded the values here, but at this point it is simple enough to store the mappings in a file, read in the file and then populate the dictionary accordingly. This way you can not only clean up the code a lot by reducing the case statement to one bit text file (outside of code) but you can modify it without needing to re-compile.
You could make a small range check and look at the ascii values.
Assuming InRange(val, min, max) checks if a number is, yep, in range..
if(InRange(System.Convert.ToInt32(valToCheck),232,235))
return 'e';
else if(InRange(System.Convert.ToInt32(valToCheck),224,229))
return 'a';
This makes the code a little confusing, and depends on the standard used, but perhaps something to consider.
This answer presumes that you are going to apply that switch statement to a string, not just to single characters (though that would also work).
The best approach seems to be the one outlined in this StackOverflow answer.
I adapted it to use LINQ:
var chars = from character in valToCheck.Normalize(NormalizationForm.FormD)
where CharUnicodeInfo.GetUnicodeCategory(character)
!= UnicodeCategory.NonSpacingMark
select character;
return string.Join("", chars).Normalize(NormalizationForm.FormC);
you'll need a using directive for System.Globalization;
Sample input:
string valToCheck = "êéÈöü";
Sample output:
eeEou
Based on Michael Kaplan's RemoveDiacritics(), you could do something like this:
static char RemoveDiacritics(char c)
{
string stFormD = c.ToString().Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(stFormD[ich]);
}
}
return (sb.ToString()[0]);
}
switch(RemoveDiacritics(valToCheck))
{
case 'e':
//...
break;
case 'a':
//...
break;
//...
}
or, potentially even:
retval = RemoveDiacritics(valToCheck);
Use Contains instead of switch.
var retVal = String.Empty;
string es = "éêèë";
if (es.Contains(valToCheck)) retVal = "e";
//etc.

C# using numbers in an enum

This is a valid enum
public enum myEnum
{
a= 1,
b= 2,
c= 3,
d= 4,
e= 5,
f= 6,
g= 7,
h= 0xff
};
But this is not
public enum myEnum
{
1a = 1,
2a = 2,
3a = 3,
};
Is there a way I can use an number in a enum? I already have code that would populate dropdowns from enums so it would be quite handy
No identifier at all in C# may begin with a number (for lexical/parsing reasons). Consider adding a [Description] attribute to your enum values:
public enum myEnum
{
[Description("1A")]
OneA = 1,
[Description("2A")]
TwoA = 2,
[Description("3A")]
ThreeA = 3,
};
Then you can get the description from an enum value like this:
((DescriptionAttribute)Attribute.GetCustomAttribute(
typeof(myEnum).GetFields(BindingFlags.Public | BindingFlags.Static)
.Single(x => (myEnum)x.GetValue(null) == enumValue),
typeof(DescriptionAttribute))).Description
Based on XSA's comment below, I wanted to expand on how one could make this more readable. Most simply, you could just create a static (extension) method:
public static string GetDescription(this Enum value)
{
return ((DescriptionAttribute)Attribute.GetCustomAttribute(
value.GetType().GetFields(BindingFlags.Public | BindingFlags.Static)
.Single(x => x.GetValue(null).Equals(value)),
typeof(DescriptionAttribute)))?.Description ?? value.ToString();
}
It's up to you whether you want to make it an extension method, and in the implementation above, I've made it fallback to the enum's normal name if no [DescriptionAttribute] has been provided.
Now you can get the description for an enum value via:
myEnum.OneA.GetDescription()
No, there isn't. C# does not allow identifiers to start with a digit.
Application usability note: In your application you should not display code identifiers to the end-user anyway. Think of translating individual enumeration items into user-friendly displayable texts. Sooner or later you'll have to extend the enum with an item whose identifier won't be in a form displayable to the user.
UPDATE: Note that the way for attaching displayable texts to enumeration items is being discusses, for example, here.
An identifier in C# (and most languages) cannot start with a digit.
If you can modify the code that populates a dropdown with the enumeration names, you could maybe have a hack that strips off a leading underscore when populating the dropdown and define your enum like so:
public enum myEnum
{
_1a = 1,
_2a = 2,
_3a = 3
};
Or if you don't like the underscores you could come up with your own 'prefix-to-be-stripped' scheme (maybe pass the prefix to the constructor or method that will populate the dropdown from the enum).
Short and crisp 4 line code.
We simply use enums as named integer for items in code,
so any simplest way is good to go.
public enum myEnum
{
_1 = 1,
_2,
_3,
};
Also for decimal values,
public enum myEnum
{
_1_5 = 1,
_2_5,
_3_5,
};
So while using this in code,
int i = cmb1.SelectedIndex(0); // not readable
int i = cmb1.SelectedIndex( (int) myEnum._1_5); // readable
No way. A valid identifier (ie a valid enumeration member) cannot start with a digit.
Enumerations are no different than variables in terms of naming rules. Therefore, you can't start the name with a number. From this post, here are the main rules for variable naming.
The name can contain letters, digits, and the underscore character
(_).
The first character of the name must be a letter. The underscore is
also a legal first character, but its
use is not recommended at the
beginning of a name. An underscore is
often used with special commands, and
it's sometimes hard to read.
Case matters (that is, upper- and lowercase letters). C# is
case-sensitive; thus, the names count
and Count refer to two different
variables.
C# keywords can't be used as variable names. Recall that a keyword
is a word that is part of the C#
language. (A complete list of the C#
keywords can be found in Appendix B,
"C# Keywords.")
Identifiers can't start with numbers. However, they can contain numbers.
Here is what i came up with as an alternative, where I needed Enums to use in a "for" Loop and a string representation equivalent to use in a Linq query.
Create enums namespace to be used in "for" Loop.
public enum TrayLevelCodes
{
None,
_5DGS,
_5DG,
_3DGS,
_3DG,
_AADC,
_ADC,
_MAAD,
_MADC
};
Create strings based on enum created to be used for Linq query
public string _5DGS = "\"5DGS\"",
_5DG = "\"5DG\"",
_3DGS = "\"3DGS\"",
_3DG = "\"3DG\"",
_AADC = "\"AADC\"",
_ADC = "\"ADC\"",
_MAAD = "\"MAAD\"",
_MADC = "\"MADC\"";
Create function that will take an enum value as argument and return corresponding string for Linq query.
public string GetCntnrLvlDscptn(TrayLevelCodes enumCode)
{
string sCode = "";
switch (enumCode)
{
case TrayLevelCodes._5DGS:
sCode = "\"5DGS\"";
break;
case TrayLevelCodes._5DG:
sCode = "\"5DG\"";
break;
case TrayLevelCodes._3DGS:
sCode = "\"3DGS\"";
break;
case TrayLevelCodes._3DG:
sCode = "\"3DG\"";
break;
case TrayLevelCodes._AADC:
sCode = "\"AADC\"";
break;
case TrayLevelCodes._ADC:
sCode = "\"AAC\"";
break;
case TrayLevelCodes._MAAD:
sCode = "\"MAAD\"";
break;
case TrayLevelCodes._MADC:
sCode = "\"MADC\"";
break;
default:
sCode = "";
break;
}
return sCode;
}
Here is how i am using what i created above.
for (var trayLevelCode = TrayLevelCodes._5DGS; trayLevelCode <= TrayLevelCodes._MADC; trayLevelCode++)
{
var TrayLvLst = (from i in pair1.Value.AutoMap
where (i.TrayLevelCode == HTMLINFO.GetCntnrLvlDscptn(trayLevelCode))
orderby i.TrayZip, i.GroupZip
group i by i.TrayZip into subTrayLvl
select subTrayLvl).ToList();
foreach (DropShipRecord tray in TrayLvLst)
{
}
}

What is quicker, switch on string or elseif on type?

Lets say I have the option of identifying a code path to take on the basis of a string comparison or else iffing the type:
Which is quicker and why?
switch(childNode.Name)
{
case "Bob":
break;
case "Jill":
break;
case "Marko":
break;
}
if(childNode is Bob)
{
}
elseif(childNode is Jill)
{
}
else if(childNode is Marko)
{
}
Update: The main reason I ask this is because the switch statement is perculiar about what counts as a case. For example it wont allow you to use variables, only constants which get moved to the main assembly. I assumed it had this restriction due to some funky stuff it was doing. If it is only translating to elseifs (as one poster commented) then why are we not allowed variables in case statements?
Caveat: I am post-optimising. This method is called many times in a slow part of the app.
Greg's profile results are great for the exact scenario he covered, but interestingly, the relative costs of the different methods change dramatically when considering a number of different factors including the number of types being compared, and the relative frequency and any patterns in the underlying data.
The simple answer is that nobody can tell you what the performance difference is going to be in your specific scenario, you will need to measure the performance in different ways yourself in your own system to get an accurate answer.
The If/Else chain is an effective approach for a small number of type comparisons, or if you can reliably predict which few types are going to make up the majority of the ones that you see. The potential problem with the approach is that as the number of types increases, the number of comparisons that must be executed increases as well.
if I execute the following:
int value = 25124;
if(value == 0) ...
else if (value == 1) ...
else if (value == 2) ...
...
else if (value == 25124) ...
each of the previous if conditions must be evaluated before the correct block is entered. On the other hand
switch(value) {
case 0:...break;
case 1:...break;
case 2:...break;
...
case 25124:...break;
}
will perform one simple jump to the correct bit of code.
Where it gets more complicated in your example is that your other method uses a switch on strings rather than integers which gets a little more complicated. At a low level, strings can't be switched on in the same way that integer values can so the C# compiler does some magic to make this work for you.
If the switch statement is "small enough" (where the compiler does what it thinks is best automatically) switching on strings generates code that is the same as an if/else chain.
switch(someString) {
case "Foo": DoFoo(); break;
case "Bar": DoBar(); break;
default: DoOther; break;
}
is the same as:
if(someString == "Foo") {
DoFoo();
} else if(someString == "Bar") {
DoBar();
} else {
DoOther();
}
Once the list of items in the dictionary gets "big enough" the compiler will automatically create an internal dictionary that maps from the strings in the switch to an integer index and then a switch based on that index.
It looks something like this (Just imagine more entries than I am going to bother to type)
A static field is defined in a "hidden" location that is associated with the class containing the switch statement of type Dictionary<string, int> and given a mangled name
//Make sure the dictionary is loaded
if(theDictionary == null) {
//This is simplified for clarity, the actual implementation is more complex
// in order to ensure thread safety
theDictionary = new Dictionary<string,int>();
theDictionary["Foo"] = 0;
theDictionary["Bar"] = 1;
}
int switchIndex;
if(theDictionary.TryGetValue(someString, out switchIndex)) {
switch(switchIndex) {
case 0: DoFoo(); break;
case 1: DoBar(); break;
}
} else {
DoOther();
}
In some quick tests that I just ran, the If/Else method is about 3x as fast as the switch for 3 different types (where the types are randomly distributed). At 25 types the switch is faster by a small margin (16%) at 50 types the switch is more than twice as fast.
If you are going to be switching on a large number of types, I would suggest a 3rd method:
private delegate void NodeHandler(ChildNode node);
static Dictionary<RuntimeTypeHandle, NodeHandler> TypeHandleSwitcher = CreateSwitcher();
private static Dictionary<RuntimeTypeHandle, NodeHandler> CreateSwitcher()
{
var ret = new Dictionary<RuntimeTypeHandle, NodeHandler>();
ret[typeof(Bob).TypeHandle] = HandleBob;
ret[typeof(Jill).TypeHandle] = HandleJill;
ret[typeof(Marko).TypeHandle] = HandleMarko;
return ret;
}
void HandleChildNode(ChildNode node)
{
NodeHandler handler;
if (TaskHandleSwitcher.TryGetValue(Type.GetRuntimeType(node), out handler))
{
handler(node);
}
else
{
//Unexpected type...
}
}
This is similar to what Ted Elliot suggested, but the usage of runtime type handles instead of full type objects avoids the overhead of loading the type object through reflection.
Here are some quick timings on my machine:
Testing 3 iterations with 5,000,000 data elements (mode=Random) and 5 types
Method Time % of optimal
If/Else 179.67 100.00
TypeHandleDictionary 321.33 178.85
TypeDictionary 377.67 210.20
Switch 492.67 274.21
Testing 3 iterations with 5,000,000 data elements (mode=Random) and 10 types
Method Time % of optimal
If/Else 271.33 100.00
TypeHandleDictionary 312.00 114.99
TypeDictionary 374.33 137.96
Switch 490.33 180.71
Testing 3 iterations with 5,000,000 data elements (mode=Random) and 15 types
Method Time % of optimal
TypeHandleDictionary 312.00 100.00
If/Else 369.00 118.27
TypeDictionary 371.67 119.12
Switch 491.67 157.59
Testing 3 iterations with 5,000,000 data elements (mode=Random) and 20 types
Method Time % of optimal
TypeHandleDictionary 335.33 100.00
TypeDictionary 373.00 111.23
If/Else 462.67 137.97
Switch 490.33 146.22
Testing 3 iterations with 5,000,000 data elements (mode=Random) and 25 types
Method Time % of optimal
TypeHandleDictionary 319.33 100.00
TypeDictionary 371.00 116.18
Switch 483.00 151.25
If/Else 562.00 175.99
Testing 3 iterations with 5,000,000 data elements (mode=Random) and 50 types
Method Time % of optimal
TypeHandleDictionary 319.67 100.00
TypeDictionary 376.67 117.83
Switch 453.33 141.81
If/Else 1,032.67 323.04
On my machine at least, the type handle dictionary approach beats all of the others for anything over 15 different types when the distribution
of the types used as input to the method is random.
If on the other hand, the input is composed entirely of the type that is checked first in the if/else chain that method is much faster:
Testing 3 iterations with 5,000,000 data elements (mode=UniformFirst) and 50 types
Method Time % of optimal
If/Else 39.00 100.00
TypeHandleDictionary 317.33 813.68
TypeDictionary 396.00 1,015.38
Switch 403.00 1,033.33
Conversely, if the input is always the last thing in the if/else chain, it has the opposite effect:
Testing 3 iterations with 5,000,000 data elements (mode=UniformLast) and 50 types
Method Time % of optimal
TypeHandleDictionary 317.67 100.00
Switch 354.33 111.54
TypeDictionary 377.67 118.89
If/Else 1,907.67 600.52
If you can make some assumptions about your input, you might get the best performance from a hybrid approach where you perform if/else checks for the few types that are most common, and then fall back to a dictionary-driven approach if those fail.
Firstly, you're comparing apples and oranges. You'd first need to compare switch on type vs switch on string, and then if on type vs if on string, and then compare the winners.
Secondly, this is the kind of thing OO was designed for. In languages that support OO, switching on type (of any kind) is a code smell that points to poor design. The solution is to derive from a common base with an abstract or virtual method (or a similar construct, depending on your language)
eg.
class Node
{
public virtual void Action()
{
// Perform default action
}
}
class Bob : Node
{
public override void Action()
{
// Perform action for Bill
}
}
class Jill : Node
{
public override void Action()
{
// Perform action for Jill
}
}
Then, instead of doing the switch statement, you just call childNode.Action()
I just implemented a quick test application and profiled it with ANTS 4.
Spec: .Net 3.5 sp1 in 32bit Windows XP, code built in release mode.
3 million tests:
Switch: 1.842 seconds
If: 0.344 seconds.
Furthermore, the switch statement results reveal (unsurprisingly) that longer names take longer.
1 million tests
Bob: 0.612 seconds.
Jill: 0.835 seconds.
Marko: 1.093 seconds.
I looks like the "If Else" is faster, at least the the scenario I created.
class Program
{
static void Main( string[] args )
{
Bob bob = new Bob();
Jill jill = new Jill();
Marko marko = new Marko();
for( int i = 0; i < 1000000; i++ )
{
Test( bob );
Test( jill );
Test( marko );
}
}
public static void Test( ChildNode childNode )
{
TestSwitch( childNode );
TestIfElse( childNode );
}
private static void TestIfElse( ChildNode childNode )
{
if( childNode is Bob ){}
else if( childNode is Jill ){}
else if( childNode is Marko ){}
}
private static void TestSwitch( ChildNode childNode )
{
switch( childNode.Name )
{
case "Bob":
break;
case "Jill":
break;
case "Marko":
break;
}
}
}
class ChildNode { public string Name { get; set; } }
class Bob : ChildNode { public Bob(){ this.Name = "Bob"; }}
class Jill : ChildNode{public Jill(){this.Name = "Jill";}}
class Marko : ChildNode{public Marko(){this.Name = "Marko";}}
Switch statement is faster to execute than the if-else-if ladder. This is due to the compiler's ability to optimise the switch statement. In the case of the if-else-if ladder, the code must process each if statement in the order determined by the programmer. However, because each case within a switch statement does not rely on earlier cases, the compiler is able to re-order the testing in such a way as to provide the fastest execution.
If you've got the classes made, I'd suggest using a Strategy design pattern instead of switch or elseif.
Try using enumerations for each object, you can switch on enums quickly and easily.
Unless you've already written this and find you have a performance problem I wouldn't worry about which is quicker. Go with the one that's more readable. Remember, "Premature optimization is the root of all evil." - Donald Knuth
A SWITCH construct was originally intended for integer data; it's intent was to use the argument directly as a index into a "dispatch table", a table of pointers. As such, there would be a single test, then launch directly to the relevant code, rather than a series of tests.
The difficulty here is that it's use has been generalized to "string" types, which obviously cannot be used as an index, and all advantage of the SWITCH construct is lost.
If speed is your intended goal, the problem is NOT your code, but your data structure. If the "name" space is as simple as you show it, better to code it into an integer value (when data is created, for example), and use this integer in the "many times in a slow part of the app".
If the types you're switching on are primitive .NET types you can use Type.GetTypeCode(Type), but if they're custom types they will all come back as TypeCode.Object.
A dictionary with delegates or handler classes might work as well.
Dictionary<Type, HandlerDelegate> handlers = new Dictionary<Type, HandlerDelegate>();
handlers[typeof(Bob)] = this.HandleBob;
handlers[typeof(Jill)] = this.HandleJill;
handlers[typeof(Marko)] = this.HandleMarko;
handlers[childNode.GetType()](childNode);
/// ...
private void HandleBob(Node childNode) {
// code to handle Bob
}
The switch() will compile out to code equivalent to a set of else ifs. The string comparisons will be much slower than the type comparisons.
I recall reading in several reference books that the if/else branching is quicker than the switch statement. However, a bit of research on Blackwasp shows that the switch statement is actually faster:
http://www.blackwasp.co.uk/SpeedTestIfElseSwitch.aspx
In reality, if you're comparing the typical 3 to 10 (or so) statements, I seriously doubt there's any real performance gain using one or the other.
As Chris has already said, go for readability:
What is quicker, switch on string or elseif on type?
I think the main performance issue here is, that in the switch block, you compare strings, and that in the if-else block, you check for types... Those two are not the same, and therefore, I'd say you're "comparing potatoes to bananas".
I'd start by comparing this:
switch(childNode.Name)
{
case "Bob":
break;
case "Jill":
break;
case "Marko":
break;
}
if(childNode.Name == "Bob")
{}
else if(childNode.Name == "Jill")
{}
else if(childNode.Name == "Marko")
{}
I'm not sure how faster it could be the right design would be to go for polymorphism.
interface INode
{
void Action;
}
class Bob : INode
{
public void Action
{
}
}
class Jill : INode
{
public void Action
{
}
}
class Marko : INode
{
public void Action
{
}
}
//Your function:
void Do(INode childNode)
{
childNode.Action();
}
Seeing what your switch statement does will help better. If your function is not really anything about an action on the type, may be you could define an enum on each type.
enum NodeType { Bob, Jill, Marko, Default }
interface INode
{
NodeType Node { get; };
}
class Bob : INode
{
public NodeType Node { get { return NodeType.Bob; } }
}
class Jill : INode
{
public NodeType Node { get { return NodeType.Jill; } }
}
class Marko : INode
{
public NodeType Node { get { return NodeType.Marko; } }
}
//Your function:
void Do(INode childNode)
{
switch(childNode.Node)
{
case Bob:
break;
case Jill:
break;
case Marko:
break;
Default:
throw new ArgumentException();
}
}
I assume this has to be faster than both approaches in question. You might want to try abstract class route if nanoseconds does matter for you.
I created a little console to show my solution, just to highlight the speed difference. I used a different string hash algorithm as the certificate version is to slow for me on runtime and duplicates are unlikely and if so my switch statement would fail (never happened till now). My unique hash extension method is included in the code below.
I will take 29 ticks over 695 ticks any time, specially when using critical code.
With a set of strings from a given database you can create a small application to create the constant in a given file for you to use in your code, if values are added you just re-run your batch and constants are generated and picked up by the solution.
public static class StringExtention
{
public static long ToUniqueHash(this string text)
{
long value = 0;
var array = text.ToCharArray();
unchecked
{
for (int i = 0; i < array.Length; i++)
{
value = (value * 397) ^ array[i].GetHashCode();
value = (value * 397) ^ i;
}
return value;
}
}
}
public class AccountTypes
{
static void Main()
{
var sb = new StringBuilder();
sb.AppendLine($"const long ACCOUNT_TYPE = {"AccountType".ToUniqueHash()};");
sb.AppendLine($"const long NET_LIQUIDATION = {"NetLiquidation".ToUniqueHash()};");
sb.AppendLine($"const long TOTAL_CASH_VALUE = {"TotalCashValue".ToUniqueHash()};");
sb.AppendLine($"const long SETTLED_CASH = {"SettledCash".ToUniqueHash()};");
sb.AppendLine($"const long ACCRUED_CASH = {"AccruedCash".ToUniqueHash()};");
sb.AppendLine($"const long BUYING_POWER = {"BuyingPower".ToUniqueHash()};");
sb.AppendLine($"const long EQUITY_WITH_LOAN_VALUE = {"EquityWithLoanValue".ToUniqueHash()};");
sb.AppendLine($"const long PREVIOUS_EQUITY_WITH_LOAN_VALUE = {"PreviousEquityWithLoanValue".ToUniqueHash()};");
sb.AppendLine($"const long GROSS_POSITION_VALUE ={ "GrossPositionValue".ToUniqueHash()};");
sb.AppendLine($"const long REQT_EQUITY = {"ReqTEquity".ToUniqueHash()};");
sb.AppendLine($"const long REQT_MARGIN = {"ReqTMargin".ToUniqueHash()};");
sb.AppendLine($"const long SPECIAL_MEMORANDUM_ACCOUNT = {"SMA".ToUniqueHash()};");
sb.AppendLine($"const long INIT_MARGIN_REQ = { "InitMarginReq".ToUniqueHash()};");
sb.AppendLine($"const long MAINT_MARGIN_REQ = {"MaintMarginReq".ToUniqueHash()};");
sb.AppendLine($"const long AVAILABLE_FUNDS = {"AvailableFunds".ToUniqueHash()};");
sb.AppendLine($"const long EXCESS_LIQUIDITY = {"ExcessLiquidity".ToUniqueHash()};");
sb.AppendLine($"const long CUSHION = {"Cushion".ToUniqueHash()};");
sb.AppendLine($"const long FULL_INIT_MARGIN_REQ = {"FullInitMarginReq".ToUniqueHash()};");
sb.AppendLine($"const long FULL_MAINTMARGIN_REQ ={ "FullMaintMarginReq".ToUniqueHash()};");
sb.AppendLine($"const long FULL_AVAILABLE_FUNDS = {"FullAvailableFunds".ToUniqueHash()};");
sb.AppendLine($"const long FULL_EXCESS_LIQUIDITY ={ "FullExcessLiquidity".ToUniqueHash()};");
sb.AppendLine($"const long LOOK_AHEAD_INIT_MARGIN_REQ = {"LookAheadInitMarginReq".ToUniqueHash()};");
sb.AppendLine($"const long LOOK_AHEAD_MAINT_MARGIN_REQ = {"LookAheadMaintMarginReq".ToUniqueHash()};");
sb.AppendLine($"const long LOOK_AHEAD_AVAILABLE_FUNDS = {"LookAheadAvailableFunds".ToUniqueHash()};");
sb.AppendLine($"const long LOOK_AHEAD_EXCESS_LIQUIDITY = {"LookAheadExcessLiquidity".ToUniqueHash()};");
sb.AppendLine($"const long HIGHEST_SEVERITY = {"HighestSeverity".ToUniqueHash()};");
sb.AppendLine($"const long DAY_TRADES_REMAINING = {"DayTradesRemaining".ToUniqueHash()};");
sb.AppendLine($"const long LEVERAGE = {"Leverage".ToUniqueHash()};");
Console.WriteLine(sb.ToString());
Test();
}
public static void Test()
{
//generated constant values
const long ACCOUNT_TYPE = -3012481629590703298;
const long NET_LIQUIDATION = 5886477638280951639;
const long TOTAL_CASH_VALUE = 2715174589598334721;
const long SETTLED_CASH = 9013818865418133625;
const long ACCRUED_CASH = -1095823472425902515;
const long BUYING_POWER = -4447052054809609098;
const long EQUITY_WITH_LOAN_VALUE = -4088154623329785565;
const long PREVIOUS_EQUITY_WITH_LOAN_VALUE = 6224054330592996694;
const long GROSS_POSITION_VALUE = -7316842993788269735;
const long REQT_EQUITY = -7457439202928979430;
const long REQT_MARGIN = -7525806483981945115;
const long SPECIAL_MEMORANDUM_ACCOUNT = -1696406879233404584;
const long INIT_MARGIN_REQ = 4495254338330797326;
const long MAINT_MARGIN_REQ = 3923858659879350034;
const long AVAILABLE_FUNDS = 2736927433442081110;
const long EXCESS_LIQUIDITY = 5975045739561521360;
const long CUSHION = 5079153439662500166;
const long FULL_INIT_MARGIN_REQ = -6446443340724968443;
const long FULL_MAINTMARGIN_REQ = -8084126626285123011;
const long FULL_AVAILABLE_FUNDS = 1594040062751632873;
const long FULL_EXCESS_LIQUIDITY = -2360941491690082189;
const long LOOK_AHEAD_INIT_MARGIN_REQ = 5230305572167766821;
const long LOOK_AHEAD_MAINT_MARGIN_REQ = 4895875570930256738;
const long LOOK_AHEAD_AVAILABLE_FUNDS = -7687608210548571554;
const long LOOK_AHEAD_EXCESS_LIQUIDITY = -4299898188451362207;
const long HIGHEST_SEVERITY = 5831097798646393988;
const long DAY_TRADES_REMAINING = 3899479916235857560;
const long LEVERAGE = 1018053116254258495;
bool found = false;
var sValues = new string[] {
"AccountType"
,"NetLiquidation"
,"TotalCashValue"
,"SettledCash"
,"AccruedCash"
,"BuyingPower"
,"EquityWithLoanValue"
,"PreviousEquityWithLoanValue"
,"GrossPositionValue"
,"ReqTEquity"
,"ReqTMargin"
,"SMA"
,"InitMarginReq"
,"MaintMarginReq"
,"AvailableFunds"
,"ExcessLiquidity"
,"Cushion"
,"FullInitMarginReq"
,"FullMaintMarginReq"
,"FullAvailableFunds"
,"FullExcessLiquidity"
,"LookAheadInitMarginReq"
,"LookAheadMaintMarginReq"
,"LookAheadAvailableFunds"
,"LookAheadExcessLiquidity"
,"HighestSeverity"
,"DayTradesRemaining"
,"Leverage"
};
long t1, t2;
var sw = System.Diagnostics.Stopwatch.StartNew();
foreach (var name in sValues)
{
switch (name)
{
case "AccountType": found = true; break;
case "NetLiquidation": found = true; break;
case "TotalCashValue": found = true; break;
case "SettledCash": found = true; break;
case "AccruedCash": found = true; break;
case "BuyingPower": found = true; break;
case "EquityWithLoanValue": found = true; break;
case "PreviousEquityWithLoanValue": found = true; break;
case "GrossPositionValue": found = true; break;
case "ReqTEquity": found = true; break;
case "ReqTMargin": found = true; break;
case "SMA": found = true; break;
case "InitMarginReq": found = true; break;
case "MaintMarginReq": found = true; break;
case "AvailableFunds": found = true; break;
case "ExcessLiquidity": found = true; break;
case "Cushion": found = true; break;
case "FullInitMarginReq": found = true; break;
case "FullMaintMarginReq": found = true; break;
case "FullAvailableFunds": found = true; break;
case "FullExcessLiquidity": found = true; break;
case "LookAheadInitMarginReq": found = true; break;
case "LookAheadMaintMarginReq": found = true; break;
case "LookAheadAvailableFunds": found = true; break;
case "LookAheadExcessLiquidity": found = true; break;
case "HighestSeverity": found = true; break;
case "DayTradesRemaining": found = true; break;
case "Leverage": found = true; break;
default: found = false; break;
}
if (!found)
throw new NotImplementedException();
}
t1 = sw.ElapsedTicks;
sw.Restart();
foreach (var name in sValues)
{
switch (name.ToUniqueHash())
{
case ACCOUNT_TYPE:
found = true;
break;
case NET_LIQUIDATION:
found = true;
break;
case TOTAL_CASH_VALUE:
found = true;
break;
case SETTLED_CASH:
found = true;
break;
case ACCRUED_CASH:
found = true;
break;
case BUYING_POWER:
found = true;
break;
case EQUITY_WITH_LOAN_VALUE:
found = true;
break;
case PREVIOUS_EQUITY_WITH_LOAN_VALUE:
found = true;
break;
case GROSS_POSITION_VALUE:
found = true;
break;
case REQT_EQUITY:
found = true;
break;
case REQT_MARGIN:
found = true;
break;
case SPECIAL_MEMORANDUM_ACCOUNT:
found = true;
break;
case INIT_MARGIN_REQ:
found = true;
break;
case MAINT_MARGIN_REQ:
found = true;
break;
case AVAILABLE_FUNDS:
found = true;
break;
case EXCESS_LIQUIDITY:
found = true;
break;
case CUSHION:
found = true;
break;
case FULL_INIT_MARGIN_REQ:
found = true;
break;
case FULL_MAINTMARGIN_REQ:
found = true;
break;
case FULL_AVAILABLE_FUNDS:
found = true;
break;
case FULL_EXCESS_LIQUIDITY:
found = true;
break;
case LOOK_AHEAD_INIT_MARGIN_REQ:
found = true;
break;
case LOOK_AHEAD_MAINT_MARGIN_REQ:
found = true;
break;
case LOOK_AHEAD_AVAILABLE_FUNDS:
found = true;
break;
case LOOK_AHEAD_EXCESS_LIQUIDITY:
found = true;
break;
case HIGHEST_SEVERITY:
found = true;
break;
case DAY_TRADES_REMAINING:
found = true;
break;
case LEVERAGE:
found = true;
break;
default:
found = false;
break;
}
if (!found)
throw new NotImplementedException();
}
t2 = sw.ElapsedTicks;
sw.Stop();
Console.WriteLine($"String switch:{t1:N0} long switch:{t2:N0}");
var faster = (t1 > t2) ? "Slower" : "faster";
Console.WriteLine($"String switch: is {faster} than long switch: by {Math.Abs(t1-t2)} Ticks");
Console.ReadLine();
}
well it depend on language you need to test yourself to see timing that which one is fast. like in php web language if / else if is fast compare to switch so you need to find it out by running some bench basic code in your desire language.
personally i prefer if / else if for code reading as switch statements can be nightmare to read where there is big code blocks in each condition as you will have to look for break keywords it each end point manually while with if / else if due to the start and end braces its easy to trace code blocks.
php
String comparison will always rely completely on the runtime environment (unless the strings are statically allocated, though the need to compare those to each other is debatable). Type comparison, however, can be done through dynamic or static binding, and either way it's more efficient for the runtime environment than comparing individual characters in a string.
Surely the switch on String would compile down to a String comparison (one per case) which is slower than a type comparison (and far slower than the typical integer compare that is used for switch/case)?
Three thoughts:
1) If you're going to do something different based on the types of the objects, it might make sense to move that behavior into those classes. Then instead of switch or if-else, you'd just call childNode.DoSomething().
2) Comparing types will be much faster than string comparisons.
3) In the if-else design, you might be able to take advantage of reordering the tests. If "Jill" objects make up 90% of the objects going through there, test for them first.
One of the issues you have with the switch is using strings, like "Bob", this will cause a lot more cycles and lines in the compiled code. The IL that is generated will have to declare a string, set it to "Bob" then use it in the comparison. So with that in mind your IF statements will run faster.
PS. Aeon's example wont work because you can't switch on Types. (No I don't know why exactly, but we've tried it an it doesn't work. It has to do with the type being variable)
If you want to test this, just build a separate application and build two simple Methods that do what is written up above and use something like Ildasm.exe to see the IL. You'll notice a lot less lines in the IF statement Method's IL.
Ildasm comes with VisualStudio...
ILDASM page - http://msdn.microsoft.com/en-us/library/f7dy01k1(VS.80).aspx
ILDASM Tutorial - http://msdn.microsoft.com/en-us/library/aa309387(VS.71).aspx
Remember, the profiler is your friend. Any guesswork is a waste of time most of the time.
BTW, I have had a good experience with JetBrains' dotTrace profiler.
Switch on string basically gets compiled into a if-else-if ladder. Try decompiling a simple one. In any case, testing string equailty should be cheaper since they are interned and all that would be needed is a reference check. Do what makes sense in terms of maintainability; if you are compring strings, do the string switch. If you are selecting based on type, a type ladder is the more appropriate.
I kind of do it a bit different,
The strings you're switching on are going to be constants, so you can predict the values at compile time.
in your case i'd use the hash values, this is an int switch, you have 2 options, use compile time constants or calculate at run-time.
//somewhere in your code
static long _bob = "Bob".GetUniqueHashCode();
static long _jill = "Jill".GetUniqueHashCode();
static long _marko = "Marko".GeUniquetHashCode();
void MyMethod()
{
...
if(childNode.Tag==0)
childNode.Tag= childNode.Name.GetUniquetHashCode()
switch(childNode.Tag)
{
case _bob :
break;
case _jill :
break;
case _marko :
break;
}
}
The extension method for GetUniquetHashCode can be something like this:
public static class StringExtentions
{
/// <summary>
/// Return unique Int64 value for input string
/// </summary>
/// <param name="strText"></param>
/// <returns></returns>
public static Int64 GetUniquetHashCode(this string strText)
{
Int64 hashCode = 0;
if (!string.IsNullOrEmpty(strText))
{
//Unicode Encode Covering all character-set
byte[] byteContents = Encoding.Unicode.GetBytes(strText);
System.Security.Cryptography.SHA256 hash = new System.Security.Cryptography.SHA256CryptoServiceProvider();
byte[] hashText = hash.ComputeHash(byteContents);
//32Byte hashText separate
//hashCodeStart = 0~7 8Byte
//hashCodeMedium = 8~23 8Byte
//hashCodeEnd = 24~31 8Byte
//and Fold
Int64 hashCodeStart = BitConverter.ToInt64(hashText, 0);
Int64 hashCodeMedium = BitConverter.ToInt64(hashText, 8);
Int64 hashCodeEnd = BitConverter.ToInt64(hashText, 24);
hashCode = hashCodeStart ^ hashCodeMedium ^ hashCodeEnd;
}
return (hashCode);
}
}
The source of this code was published here
Please note that using Cryptography is slow, you would typically warm-up the supported string on application start, i do this my saving them at static fields as will not change and are not instance relevant. please note that I set the tag value of the node object, I could use any property or add one, just make sure that these are in sync with the actual text.
I work on low latency systems and all my codes come as a string of command:value,command:value....
now the command are all known as 64 bit integer values so switching like this saves some CPU time.
I was just reading through the list of answers here, and wanted to share this benchmark test which compares the switch construct with the if-else and ternary ? operators.
What I like about that post is it not only compares single-left constructs (eg, if-else) but double and triple level constructs (eg, if-else-if-else).
According to the results, the if-else construct was the fastest in 8/9 test cases; the switch construct tied for the fastest in 5/9 test cases.
So if you're looking for speed if-else appears to be the fastest way to go.
I may be missing something, but couldn't you do a switch statement on the type instead of the String? That is,
switch(childNode.Type)
{
case Bob:
break;
case Jill:
break;
case Marko:
break;
}

Categories

Resources