Is a switch statement ok for 30 or so conditions? - c#

I am in the final stages of creating an MP4 tag parser in .Net. For those who have experience with tagging music you would be aware that there are an average of 30 or so tags. If tested out different types of loops and it seems that a switch statement with Const values seems to be the way to go with regard to catching the tags in binary.
The switch allows me to search the binary without the need to know which order the tags are stored or if there are some not present but I wonder if anyone would be against using a switch statement for so many conditionals.
Any insight is much appreciated.
EDIT: One think I should add now that where discussing this is that the function is recursive, should I pull out this conditional and pass the data of to a method I can kill?

It'll probably work fine with the switch, but I think your function will become very long.
One way you could solve this is to create a handler class for each tag type and then register each handler with the corresponding tag in a dictionary. When you need to parse a tag, you can look up in the dictionary which handler should be used.

Personally, if you must, I would go this way. A switch statement is much easier to read than If/Else statements (and at your size will be optimized for you).
Here is a related question. Note that the accepted answer is the incorrect one.
Is there any significant difference between using if/else and switch-case in C#?

Another option (Python inspired) is a dictionary that maps a tag to a lambda function, or an event, or something like that. It would require some re-architecture.

For something low level like this I don't see a problem. Just make sure you place each case in a separate method. You will thank yourself later.

To me, having so many conditions in a switch statement gives me reason for thought. It might be better to refactor the code and rely on virtual methods, association between tags and methods, or any other mechanism to avoid spagetti code.

If you have only one place that has that particular structure of switch and case statements, then it's a matter of style. If you have more than one place that has the same structure, you might want to rethink how you do it to minimize maintenance headaches.

It's hard to tell without seeing your code, but if you're just trying to catch each tag, you could define an array of acceptable tags, then loop through the file, checking to see if each tag is in the array.

ID3Sharp on Sourceforge has a http://sourceforge.net/projects/id3sharp/ uses a more object oriented approach with a FrameRegistry that hands out derived classes for each frame type.
It's fast, it works well, and it's easy to maintain. The 'overhead' of creating a small class object in C# is negligible compared to opening an MP4 file to read the header.

One design that might be useful in some case (but from what I've seen would be over kill here):
class DoStuff
{
public void Do(type it, Context context )
{
switch(it)
{
case case1: doCase1(context) break;
case case2: doCase2(context) break;
//...
}
}
protected abstract void doCase1(Context context);
protected abstract void doCase2(Context context);
//...
}
class DoRealStuff : DoStuff
{
override void doCase1(Context context) { ... }
override void doCase2(Context context) { ... }
//...
}

I'm not familiar with the MP4 technology but I would explore the possiblity of using some interfaces here. Pass in an object, try to cast to the interface.
public void SomeMethod(object obj)
{
ITag it = obj as ITag;
if(it != null)
{
it.SomeProperty = "SomeValue";
it.DoSomthingWithTag();
}
}

I wanted to add my own answer just to bounce off people...
Create an object that holds the "binary tag name", "Data", "Property Name".
Create a list of these totaling the amount of tags known adding the tag name and property name.
When parsing use linq to match the found name with the object.binarytagname and add the data
reflect into the property and add the data...

What about good old for loop? I think you can design it that way. Isn't switch-case only transformed if-else anyway? I always try to write code using loop if amount of case statements is becoming higher than acceptable. And 30 cases in switch is too high for me.

You almost certainly have a Chain of Responsibility in your problem. Refactor.

Related

Proper approach to interfaces in C#

I've created an interface which looks like this:
interface ICsvReader
{
List<string> ReadFromStream(Stream csvStream);
}
My question is about return type List<string>. In tutorials I can see a lot of examples where methods are just void. In that cases interface looks natural:
interface ILogger
{
void LogError(string error);
}
you don't have any specific destination of logging or method how to log errors. Like I said it looks natural for me, but what about specific types to return? Isn't it bad approach? When I'm using interface I want to create some abstraction over my methods - 'You should do this, but I don't care how'. So do you have any better idea for interface for file reader or something? I would like to read CSV from different sources but always return List<string>. Good or bad approach?
Logger is kind of writer so void; ICsvReader as name suggests it is reader meaning it is going to read something for you and give it in return.
Have you ever seen a read method with return type void? I can't remember one!
Only thing I can suggest is use IEnumerable<string> Always promise less than what you can deliver. That will help you to switch to deferred execution if required in future.
There is nothing wrong here. Since Logger does write operation it is void that's not your case you need to yield something saying "this is what I read for you".
Well, returning List<string> means that you have the whole structure in your memory. For CSV files larger that 2 G this may be not appropriate.
Another choice would be returning IEnumerable<string> — that would let a CSV-reader to decide whether it want to read the whole file at once, or do incremental loading and parsing. Or you would be able to have two different classes, one that would try to load whole file at once, and another would work step-by-step.
Of course, List<T> has methods and properties that IEnumerable<T> doesn't have, so you would have to decide whether this added flexibility is worth it. But I've seen a number of server-side plugins that would read gigantic files into memory in order to send them to the client, so I recommend at least think about this.
Regarding void vs List return type in interface
I think what approach you are taking is absolutely correct. In your case you are returning List is not incorrect, actually that is need of your application. And to do so your are declaring interface. Interface method declaration can be anything that suits your code.
As many answers suggested here for code optimization purpose use IEnumerable.
From Question:
So do you have any better idea for interface for file reader or
something?
Just suggestion, do you really need to create interface. Because definition of your ReadFromStream method in your case looks like going to be same, so you may end up writing same code in various classes. And solution will be write method in base class/ in abstract class(in which you will achieve abstraction)

A "Function lookup table" in place of switches

I came across some code recently that replaces the use of switches by hard-coding a
Dictionary<string (or whatever we would've been switching on), Func<...>>
and where ever the switch would've been, it instead does dict["value"].Invoke(...).
The code feels wrong in some way, but at the same time, the methods do look a bit cleaner, especially when there's many possible cases. I can't give any rationale as to why this is good or bad design so I was hoping someone could give some reasons to support/condemn this kind of code. Is there a gain in performance? Loss of clarity?
Example:
public class A {
...
public int SomeMethod(string arg){
...
switch(arg) {
case "a": do stuff; break;
case "b": do other stuff; break;
etc.
}
...
}
...
}
becomes
public class A {
Dictionary<string, Func<int>> funcs = new Dictionary<string, Func<int>> {
{ "a", () => 0; },
{ "b", () => DoOtherStuff(); }
... etc.
};
public int SomeMethod(string arg){
...
funcs[arg].Invoke();
...
}
...
}
Advantages:
You can change the behaviour at runtime of the "switch" at runtime
it doesn't clutter the methods using it
you can have non-literal cases (ie. case a + b == 3) with much less hassle
Disadvantages:
All of your methods must have the same signature.
You have a change of scope, you can't use variables defined in the scope of the method unless you capture them in the lambda, you'll have to take care of redefining all lambdas should you add a variable at some point
you'll have to deal with non-existant indexes specifically (similar to default in a switch)
the stacktrace will be more complicated if an unhandled exception should bubble up, resulting in a harder to debug application
Should you use it? It really depends. You'll have to define the dictionary at some place, so the code will be cluttered by it somewhere. You'll have to decide for yourself. If you need to switch behaviour at runtime, the dictionary solution really sticks out, especially, if the methods you use don't have sideeffects (ie. don't need access to scoped variables).
For several reasons:
Because doing it this way allows you to select what each case branch will do at runtime. Otherwise, you have to compile it in.
What's more, you can also change the number of branches at runtime.
The code looks much cleaner especially with a large number of branches, as you mention.
Why does this solution feel wrong to you? If the dictionary is populated at compile time, then you certainly don't lose any safety (the delegates that go in certainly have to compile without error). You do lose a little performance, but:
In most cases the performance loss is a non-issue
The flexibility you gain is enormous
Jon has a couple good answers. Here are some more:
Whenever you need a new case in a switch, you have to code it in to that switch statement. That requires opening up that class (which previously worked just fine), adding the new code, and re-compiling and re-testing that class and any class that used it. This violates a SOLID development rule, the Open-Closed Principle (classes should be closed to modification, but open to extension). By contrast, a Dictionary of delegates allows delegates to be added, removed, and swapped out at will, without changing the code doing the selecting.
Using a Dictionary of delegates allows the code to be performed in a condition to be located anywhere, and thus given to the Dictionary from anywhere. Given this freedom, it's easy to turn the design into a Strategy pattern where each delegate is provided by a unique class that performs the logic for that case. This supports encapsulation of code and the Single Responsibility Principle (a class should do one thing, and should be the only class responsible for that thing).
If there are more number of possible cases then it is good idea to replace Switch Statement with the strategy pattern, See this.
Applying Strategy Pattern Instead of Using Switch Statements
No one has said anything yet about what I believe to be the single biggest drawback of this approach.
It's less maintainable.
I say this for two reasons.
It's syntactically more complex.
It requires more reasoning to understand.
Most programmers know how a switch statement works. Many programmers have never seen a Dictionary of functions.
While this might seem like an interesting and novel alternative to the switch statement and may very well be the only way to solve some problems, it is considerably more complex. If you don't need the added flexibility you shouldn't use it.
Convert your A class to a partial class, and create a second partial class in another file with just the delegate dictionary in it.
Now you can change the number of branches, and add logic to your switch statement without touching the source for the rest of your class.
(Regardless of language) Performance-wise, where such code exists in a critical section, you are almost certainly better off with a function look-up table.
The reason is that you eliminate multiple runtime conditionals (the longer your switch, the more comparisons there will be) in favour of simple array indexing and function call.
The only performance downside is you've introduced the cost of a function call. This will typically be preferable to said conditionals. Profile the difference; YMMV.

Any reason not to use `new object().foo()`?

When using extremely short-lived objects that I only need to call one method on, I'm inclined to chain the method call directly to new. A very common example of this is something like the following:
string noNewlines = new Regex("\\n+").Replace(" ", oldString);
The point here is that I have no need for the Regex object after I've done the one replacement, and I like to be able to express this as a one-liner. Is there any non-obvious problem with this idiom? Some of my coworkers have expressed discomfort with it, but without anything that seemed to be like a good reason.
(I've marked this as both C# and Java, since the above idiom is common and usable in both languages.)
This particular pattern is fine -- I use it myself on occasion.
But I would not use this pattern as you have in your example. There are two alternate approaches that are better.
Better approach: Use the static method Regex.Replace(string,string,string). There is no reason to obfuscate your meaning with the new-style syntax when a static method is available that does the same thing.
Best approach: If you use the same static (not dynamically-generated) Regex from the same method, and you call this method a lot, you should store the Regex object as a private static field on the class containing the method, since this avoids parsing the expression on each call to the method.
I don't see anything wrong with this; I do this quite frequently myself.
The only exception to the rule might be for debugging purposes, it's sometimes necessary to be able to see the state of the object in the debugger, which can be difficult in a one-liner like this.
If you don't need the object afterwards, I don't see a problem - I do it myself from time to time as well. However, it can be quite hard to spot, so if your coworkers are expressing discomfort, you might need to put it into a variable so there are no hard feelings on the team. Doesn't really hurt you.
You just have to be careful when you're chaining methods of objects that implement IDisposable. Doing a single-line chain doesn't really leave room for calling Dispose or the using {...} block.
For example:
DialogResult result = New SomeCfgDialog(some_data).ShowDialog();
There is no instance variable on which to call Dispose.
Then there is potential to obfuscate intent, hurt rather than improve readability and make it tougher to examine values while debugging. But those are all issues particular to the object and the situation and the number of methods chained. I don't think that there is a single reason to avoid it. Sometimes doing this will make the code more concise and readable and other times it might hurt for some of the reasons mentioned above.
As long as you're sure that the object is never needed again (or you're not creating multiple instances of an identical object), then there's no problem with it.
If the rest of your team isn't comfortable with it, though, you might want to re-think the decision. The team should set the standards and you should follow them. Be consistent. If you want to change the standard, discuss it. If they don't agree, then fall in line.
I think thats ok, and would welcome comments/reasons to the contrary. When the object is not short lived (or uses unmanaged resources - ie COM) then this practice can get you into trouble.
The issue is readability.
Putting the "chained" methods on a separate line seems to be the preferred convention with my team.
string noNewlines = new Regex("\\n+")
.Replace(" ", oldString);
One reason to avoid this style is that your coworkers might want to inspect the object in a debug mode. If you compound the similar instantiation the readability goes down a lot. For example :
String val = new Object1("Hello").doSomething(new Object2("interesting").withThis("input"));
Generally I prefer using a static method for the specific example you have mentioned.
The only potential problem I could see is - if, for some reason, new Regex were NULL because it was not instantiated correctly, you would get a Null Pointer Exception. However, I highly doubt that since Regex is always defined...
If you don't care about the object you invoke the method on, that's a sign that the method should probably be static.
In C#, I'd probably write an extension method to wrap the regex, so that I could write
string noNewlines = oldString.RemoveNewlines();
The extension method would look something like
using System.Text.RegularExpressions;
namespace Extensions
{
static class SystemStringExtensions
{
public static string RemoveNewlines(this string inputString)
{
// replace newline characters with spaces
return Regex.Replace(inputString, "\\n+", " ");
}
}
}
I find this much easier to read than your original example. It's also quite reusable, as stripping newline characters is one of the more common activities.

Getting my head around object oriented programming

I am entry level .Net developer and using it to develop web sites. I started with classic asp and last year jumped on the ship with a short C# book.
As I developed I learned more and started to see that coming from classic asp I always used C# like scripting language.
For example in my last project I needed to encode video on the webserver and wrote a code like
public class Encoder
{
Public static bool Encode(string videopath) {
...snip...
return true;
}
}
While searching samples related to my project I’ve seen people doing this
public class Encoder
{
Public static Encode(string videopath) {
EncodedVideo encoded = new EncodedVideo();
...snip...
encoded.EncodedVideoPath = outputFile;
encoded.Success = true;
...snip...
}
}
public class EncodedVideo
{
public string EncodedVideoPath { get; set; }
public bool Success { get; set; }
}
As I understand second example is more object oriented but I don’t see the point of using EncodedVideo object.
Am I doing something wrong? Does it really necessary to use this sort of code in a web app?
someone once explained OO to me as a a soda can.
A Soda can is an object, an object has many properties. And many methods. For example..
SodaCan.Drink();
SodaCan.Crush();
SocaCan.PourSomeForMyHomies();
etc...
The purpose of OO Design is theoretically to write a line of code once, and have abstraction between objects.
This means that Coder.Consume(SodaCan.contents); is relative to your question.
An encoded video is not the same thing as an encoder. An encoder returns an encoded video. and encoded video may use an encoder but they are two seperate objects. because they are two different entities serving different functions, they simply work together.
Much like me consuming a soda can does not mean that I am a soda can.
Neither example is really complete enough to evaluate. The second example seems to be more complex than the first, but without knowing how it will be used it's difficult to tell.
Object Oriented design is at it's best when it allows you to either:
1) Keep related information and/or functions together (instead of using parallel arrays or the like).
Or
2) Take advantage of inheritance and interface implementation.
Your second example MIGHT be keeping the data together better, if it returns the EncodedVideo object AND the success or failure of the method needs to be kept track of after the fact. In this case you would be replacing a combination of a boolean "success" variable and a path with a single object, clearly documenting the relation of the two pieces of data.
Another possibility not touched on by either example is using inheritance to better organize the encoding process. You could have a single base class that handles the "grunt work" of opening the file, copying the data, etc. and then inherit from that class for each different type of encoding you need to perform. In this case much of your code can be written directly against the base class, without needing to worry about what kind of encoding is actually being performed.
Actually the first looks better to me, but shouldn't return anything (or return an encoded video object).
Usually we assume methods complete successfully without exceptional errors - if exceptional errors are encountered, we throw an exception.
Object oriented programming is fundamentally about organization. You can program in an OO way even without an OO language like C#. By grouping related functions and data together, it is easier to deal with increasingly complex projects.
You aren't necessarily doing something wrong. The question of what paradigm works best is highly debatable and isn't likely to have a clear winner as there are so many different ways to measure "good" code,e.g. maintainable, scalable, performance, re-usable, modular, etc.
It isn't necessary, but it can be useful in some cases. Take a look at various MVC examples to see OO code. Generally, OO code has the advantage of being re-usable so that what was written for one application can be used for others over and over again. For example, look at log4net for example of a logging framework that many people use.
The way your structure an OO program--which objects you use and how you arrange them--really depends on many factors: the age of the project, the overall size of the project, complexity of the problem, and a bit for just personal taste.
The best advice I can think of that will wrap all the reasons for OO into one quick lesson is something I picked up learning design patterns: "Encapsulate the parts that change." The value of OO is to reuse elements that will be repeated without writing additional code. But obviously you only care to "wrap up" code into objects if it will actually be reused or modified in the future, thus you should figure out what is likely to change and make objects out of it.
In your example, the reason to use the second set up may be that you can reuse the EncodedVideo object else where in the program. Anytime you need to deal with EncodedVideo, you don't concern yourself with the "how do I encode and use video", you just use the object you have and trust it to handle the logic. It may also be valuable to encapsulate the encoding logic if it's complex, and likely to change. Then you isolate changes to just one place in the code, rather than many potential places where you might have used the object.
(Brief aside: The particular example you posted isn't valid C# code. In the second example, the static method has no return type, though I assume you meant to have it return the EncodedVideo object.)
This is a design question, so answer depends on what you need, meaning there's no right or wrong answer. First method is more simple, but in second case you incapsulate encoding logic in EncodedVideo class and you can easily change the logic (based on incoming video type, for instance) in your Encoder class.
I think the first example seems more simple, except I would avoid using statics whenever possible to increase testability.
public class Encoder
{
private string videoPath;
public Encoder(string videoPath) {
this.videoPath = videoPath;
}
public bool Encode() {
...snip...
return true;
}
}
Is OOP necessary? No.
Is OOP a good idea? Yes.
You're not necessarily doing something wrong. Maybe there's a better way, maybe not.
OOP, in general, promotes modularity, extensibility, and ease of maintenance. This goes for web applications, too.
In your specific Encoder/EncodedVideo example, I don't know if it makes sense to use two discrete objects to accomplish this task, because it depends on a lot of things.
For example, is the data stored in EncodedVideo only ever used within the Encode() method? Then it might not make sense to use a separate object.
However, if other parts of the application need to know some of the information that's in EncodedVideo, such as the path or whether the status is successful, then it's good to have an EncodedVideo object that can be passed around in the rest of the application. In this case, Encode() could return an object of type EncodedVideo rather than a bool, making that data available to the rest of your app.
Unless you want to reuse the EncodedVideo class for something else, then (from what code you've given) I think your method is perfectly acceptable for this task. Unless there's unrelated functionality in EncodedVideo and the Encoder classes or it forms a massive lump of code that should be split down, then you're not really lowering the cohesion of your classes, which is fine. Assuming you don't need to reuse EncodedVideo and the classes are cohesive, by splitting them you're probably creating unnecessary classes and increasing coupling.
Remember: 1. the OO philosophy can be quite subjective and there's no single right answer, 2. you can always refactor later :p

Logic is now Polymorphism instead of Switch, but what about constructing?

This question is specifically regarding C#, but I am also interested in answers for C++ and Java (or even other languages if they've got something cool).
I am replacing switch statements with polymorphism in a "C using C# syntax" code I've inherited. I've been puzzling over the best way to create these objects. I have two fall-back methods I tend to use. I would like to know if there are other, viable alternatives that I should be considering or just a sanity check that I'm actually going about this in a reasonable way.
The techniques I normally use:
Use an all-knowing method/class. This class will either populate a data structure (most likely a Map) or construct on-the-fly using a switch statement.
Use a blind-and-dumb class that uses a config file and reflection to create a map of instances/delegates/factories/etc. Then use map in a manner similar to above.
???
Is there a #3, #4... etc that I should strongly consider?
Some details... please note, the original design is not mine and my time is limited as far as rewriting/refactoring the entire thing.
Previous pseudo-code:
public string[] HandleMessage(object input) {
object parser = null;
string command = null;
if(input is XmlMessage) {
parser = new XmlMessageParser();
((XmlMessageParser)parser).setInput(input);
command = ((XmlMessageParser)parser).getCommand();
} else if(input is NameValuePairMessage) {
parser = new NameValuePairMessageParser();
((NameValuePairMessageParser)parser).setInput(input);
command = ((XmlMessageParser)parser).getCommand();
} else if(...) {
//blah blah blah
}
string[] result = new string[3];
switch(command) {
case "Add":
result = Utility.AddData(parser);
break;
case "Modify":
result = Utility.ModifyData(parser);
break;
case ... //blah blah
break;
}
return result;
}
What I plan to replace that with (after much refactoring of the other objects) is something like:
public ResultStruct HandleMessage(IParserInput input) {
IParser parser = this.GetParser(input.Type); //either Type or a property
Map<string,string> parameters = parser.Parse(input);
ICommand command = this.GetCommand(parameters); //in future, may need multiple params
return command.Execute(parameters); //to figure out which object to return.
}
The question is what should the implementation of GetParser and GetCommand be?
Putting a switch statement there (or an invokation of a factory that consists of switch statements) doesn't seem like it really fixes the problem. I'm just moving the switch somewhere else... which maybe is fine as its no longer in the middle of my primary logic.
You may want to put your parser instantiators on the objects themselves, e.g.,
public interface IParserInput
{
...
IParser GetParser()
ICommand GetCommand()
}
Any parameters that GetParser needs should, theoretically, be supplied by your object.
What will happen is that the object itself will return those, and what happens with your code is:
public ResultStruct HandleMessage(IParserInput input)
{
IParser parser = input.GetParser();
Map<string,string> parameters = parser.Parse(input);
ICommand command = input.GetCommand();
return command.Execute(parameters);
}
Now this solution is not perfect. If you do not have access to the IParserInput objects, it might not work. But at least the responsibility of providing information on the proper handler now falls with the parsee, not the handler, which seems to be more correct at this point.
You can have an
public interface IParser<SomeType> : IParser{}
And set up structuremap to look up for a Parser for "SomeType"
It seems that Commands are related to the parser in the existing code, if you find it clean for your scenario, you might want to leave that as is, and just ask the Parser for the Command.
Update 1: I re-read the original code. I think for your scenario it will probably be the least change to define an IParser as above, which has the appropiate GetCommand and SetInput.
The command/input piece, would look something along the lines:
public string[] HandleMessage<MessageType>(MessageType input) {
var parser = StructureMap.GetInstance<IParser<MessageType>>();
parser.SetInput(input);
var command = parser.GetCommand();
//do something about the rest
}
Ps. actually, your implementation makes me feel that the old code, even without the if and switch had issues. Can you provide more info on what is supposed to happen in the GetCommand in your implementation, does the command actually varies with the parameters, as I am unsure what to suggest for that because of it.
I don't see any problems with a message handler like you have it. I certainly wouldn't go with the config file approach - why create a config file outside the debugger when you can have everything available at compile time?
The third alternative would be to discover the possible commands at runtime in a decentralized way.
For example, Spring can do this in Java using so-called “classpath scanning”, reflection and annotations — Spring parses all classes in the package(s) you specify, picks ones annotated with #Controller, #Resource etc and registers them as beans.
Classpath scanning in Java relies on directory entries being added to JAR archives (so that Spring can enumerate the contents of various classpath directories).
I don't know about C#, but there should be a similar technique there: probably you can enumerate a list of classes in your assembly, and pick some of them based on some criteria (naming convention, annotation, whatever).
Now, this is just a third option to have in mind for the sake of having a third option in mind. I doubt it should actually be used in practise. You first alternative (just write a piece of code that knows about all the classes) should be the default choice unless you have a compelling reason to do otherwise.
In the decentralised world of OOP, where each class is a little piece of the puzzle, there has to be some “integration code” that knows how to put these pieces together. There's nothing wrong about having such “all-knowing” classes (as long as you limit them to application-level and subsystem-level integration code only).
Whichever way you choose (hard-code the possible choices in a class, read a config file or use reflection to discover the choices), it's all the same story, does not really matter, and can easily be changed at any time.
Have fun!

Categories

Resources