This question arose when I was trying to figure out a larger problem that, for simplicity sake, I'm omitting.
I have to represent a certain data structure in C#. Its a protocol that will be used to communicate with an external system. As such, it has a sequence of strings with predefined lengths and integer (or other, more complicated data). Let's assume:
SYSTEM : four chars
APPLICATION : eight chars
ID : four-byte integer
Now, my preferred way to represent this would be using strings, so
class Message
{
string System {get; set; }; // four characters only!
string Application {get; set; }; // eight chars
int Id {get; set; };
}
Problem is: I have to ensure that string doesn't have more than the predefined length. Furthermore, this header will actually have tenths of fields, are those will change every now and then (we are still deciding the message layout).
How is the best way to describe such structure? I thought, for example, to use a XML with the data description and use reflection in order to create a class that adheres to the implementation (since I need to access it programatically).
And, like I said, there is more trouble. I have other types of data types that limits the number of characters/digits...
For starters: the whole length issue. That's easily solved by not using auto-properties, but instead declaring your own field and writing the property the "old-fashioned" way. You can then validate your requirement in the setter, and throw an exception or discard the new value if it's invalid.
For the changing structure: If it's not possible to just go in and alter the class, you could write a solution which uses a Dictionary (well, perhaps one per data type you want to store) to associate a name with a value. Add a file of some sort (perhaps XML) which describes the fields allowed, their type, and validation requirements.
However, if it's just changing because you haven't decided on a final structure yet, I would probably prefer just changing the class - if you don't need that sort of dynamic structure when you deploy your application, it seems like a waste of time, since you'll probably end up spending more time writing the dynamic stuff than you would altering the class.
Related
I'm a bit confused about C#'s use of attributes. At first I thought it was simply used to give program code additional information through the use of the [Obsolete] attribute. Now I find that [Dllimport] can be used to import a dynamic linked library and its functions. Can attributes import .exe files and other kind of files?
A last question, for programmers working in C# every day, how much do you use attributes, and do you use it for anything else than extending information and importing dll's?
Simply said, attributes are just metadata attached to classes or methods, at the very base.
The compiler, however, reads through your code, and runs specific actions for specific attributes it encounters while doing so, hardcoded into it. E.g., when it finds a DllImportAttribute on a method, it will resolve it to an external symbol (again, this is a very simplified explanation).
When it finds an ObsoleteAttribute, it emits a warning of deprecation.
Your own attributes (which you can create with a class inheriting from the Attribute base class) will not have an effect on the default compiler. But you (or other libraries) can also scan for them at runtime, opening up many possibilities and leading to your second question:
I typically use them to do meta programming. For example, imagine a custom network server handling packets of a specific format, implemented in different classes. Each packet format is recognized by reading an integer value. Now I need to find the correct class to instantiate for that integer.
I could do that with a switch..case or dictionary mapping integer -> packet which I extend every time I add a packet, but that is ugly since I have to touch code possibly far away from the actual Packet class whenever I add or delete a packet. I may not even know about the switch or dictionary in case the server is implemented in another assembly than my packets (modularity / extensibility)!
Instead, I create a custom PacketAttribute, storing an integer property set via the attribute, and decorate all my Packet classes with it. The server only has to scan through my assembly types at startup (via reflection) and build a dictionary of integer -> packet pairs automatically. Of course I could scan my assembly every time I need a packet, but that's probably a bit slow performance-wise.
There are APIs which are much more attribute heavy, like controllers in ASP.NET Core: You map full request URLs to methods in handler classes with them, which then execute the server code. Even URL parameters are mapped to parameters in that way.
Debuggers can also make use of attributes. For example, decorating a class with the DebuggerDisplayAttribute lets you provide a custom string displayed for the instances of the class when inspecting them in Visual Studio, which has a specific format and can directly show the values of important members.
You can see, attributes can be very powerful if utilized nicely. The comments give some more references! :)
To answer the second part of your questions, they are also used, for example, in setting validation and display attributes for both client and server side use in a web application. For example:
[Display(Name = "Person's age")]
[Required(ErrorMessage = "Persons's age is required")]
[RangeCheck(13, 59, ErrorMessage = "The age must be between 13 and 59")]
public int? PersonsAgeAtBooking { get; set; }
Or to decorate enums for use in display
public enum YesNoOnlyEnum
{
[Description("Yes")]
Yes = 1,
[Description("No")]
No = 2
}
There are many other uses.
I have a (not quite valid) CSV file that contains rows of multiple types. Any record could be one of about 6 different types and each type has a different number of properties. The first part of any row contains the timestamp and the type of record, followed by a standard CSV of the data.
Example
1456057920 PERSON, Ted Danson, 123 Fake Street, 555-123-3214, blah
1476195120 PLACE, Detroit, Michigan, 12345
1440581532 THING, Bucket, Has holes, Not a good bucket
And to make matters more complex, I need to be able to do different things with the records depending on certain criteria. So a PERSON type can be automatically inserted into a DB without user input, but a THING type would be displayed on screen for the user to review and approve before adding to DB and continuing the parse, etc.
Normally, I would use a library like CsvHelper to map the records to a type, but in this case since the types could be different, and the first part uses a space instead of comma, I dont know how to do that with a standard CSV library. So currently how I am doing it each loop is:
String split based off comma.
Split the first array item by the space.
Use a switch statement to determine the type and create the object.
Put that object into a List of type object.
Get confused as to where to go now because i now have a list of various types and will have to use yet another switch or if to determine the next parts.
I don't really know for sure if I will actually need that List but I have a feeling the user will want the ability to manually flip through records in the file.
By this point, this is starting to make for very long, confusing code, and my gut feeling tells me there has to be a cleaner way to do this. I thought maybe using Type.GetType(string) would help simplify the code some, but this seems like it might be terribly inefficient in a loop with 10k+ records and might make things even more confusing. I then thought maybe making some interfaces might help, but I'm not the greatest at using interfaces in this context and I seem to end up in about this same situation.
So what would be a more manageable way to parse this file? Are there any C# parsing libraries out there that would be able to handle something like this?
You can implement an IRecord interface that has a Timestamp property and a Process method (perhaps others as well).
Then, implement concrete types for each type of record.
Use a switch statement to determine the type and create and populate the correct concrete type.
Place each object in a List
After that you can do whatever you need. Some examples:
Loop through each item and call Process() to handle it.
Use linq .OfType<{concrete type}> to segment the list. (Warning with 10k
records, this would be slow since it would traverse the entire list for each concrete type.)
Use an overridden ToString method to give a single text representation of the IRecord
If using WPF, you can define a datatype template for each concrete type, bind an ItemsControl derivative to a collection of IRecords and your "detail" display (e.g. ListItem or separate ContentControl) will automagically display the item using the correct DataTemplate
Continuing in my comment - well that depends. What u described is actually pretty good for starters, u can of course expand it to a series of factories one for each object type - so that you move from explicit switch into searching for first factory that can parse a line. Might prove useful if u are looking to adding more object types in the future - you just add then another factory for new kind of object. Up to you if these objects should share a common interface. Interface is used generally to define a a behavior, so it doesn't seem so. Maybe you should rather just a Dictionary? You need to ask urself if you actually need strongly typed objects here? Maybe what you need is a simple class with ObjectType property and Dictionary of properties with some helper methods for easy typed properties access like GetBool, GetInt or generic Get?
I have posted a question on here previously asking similar advise, but this project has evolved significantly, so I would like to ask for advice on how the experts would tackle this problem.
First, I will describe what the problem is, then how I have currently looked at it. Please, I want to learn - so do critise my approach/tell me what I can/should do better!
Requirements:
I have a log file decoder. I have three different systems generating log files. Each system is slightly different. There are seven different types of log files. Each log file can be in either ASCII format (human readable) or binary format (not human readable). So there are a lot of different logs - but many are similar. For example, for most, the binary and ascii is the same info in a different form.
There is also one log type which is in a totally different structure, i.e., if a, b and c are different values - each stored 6 times, most logs are type 1. One log is type 2.
type 1: abcabcabcabcabcabc
type 2: aaaaaabbbbbbcccccc
On top of this, each system has a status register. The three systems are all different in this respect. i.e. 7 * 8 bit registers, 3 * 32 bit registers... These need processing after the log is decoded (for the logs that contain the info) and then a chart needs to be plotted for other info (where required).
So, my solution so far:
I have a LogFile struct. This contains a DataTable to contain all the data. Also contains a few strings, such as serial numbers which are read from the log files and some Enums (log type, system type, encoding format)
I have a Parser class. This has some static methods : to Identify what logs are contained within a log file (An ASCII file can contain several different ones - the GUI will find out what is in there, ask the user which one they want and then decode it. Another static method to act as a factory and give back an instantiation of the Parser class - there are 3 types. One generic. One for binary of the (type 2, above) and one for ascii of the (type 2, above).
I have a SystemType class. This contains info such as status register meanings, log structures for each type. I.e. when decoding a type, the GUI will call the 'GetTable, which will give back a DataTable with columns of the fields to read from the file. The Parser can then just cycle through the columns, which will allow it to know what type of variable to read from the file (Int, Single, String, etc).
I have a Reader class. This is abstract and has two child classes - one for ascii, one for binary. So, I can call reader.ReadInt and it will handle appropriately.
There is also a class to generate charts and decode the status register. Status registers are just an array of array of strings, giving name and description of each bit. Perhaps this could be a struct - but does it make a difference? There is also a further class which analyses 3 values in one particular log and if they are present, will insert a column with a value calculated from them (they are strings).
The whole things just isn't very flexible, but I didn't want to write a different class for each of (3*7*2 =) 42 log types! They are too similar, yet different so I think that they would have resulted in a lot of duplicate code. This is why I came up with the idea of the DataTable and a generic Parser.
So, sorry for the long text!
I have a few other questions - I have used a DataTable for the data because I use a DataGridView in the GUI to display all of this to the user. I assumed this would simplify this, but is there a better way of doing this? When I bind the DataTable to the DataGridView, I have to go through each one looking for a particular row to highlight, adding tooltips and setting various column widths, which actually takes as long as the whole decoding process. So if there is a more efficient way of doing this, it would be great!
Thanks for any feedback!! Please, I can not have too much advice here as I have been playing around, rearranging for ages trying to get it in a way that I think is a nice solution, but it always seems clunky and very tightly coupled, espcially with the GUI.
You probably want a class instead of a struct.
I wouldn't use a DataTable unless I had to. I would instead use a List or something similar, you can still bind this to your DataGridView. For formatting the grid, if this is an option, buy a UI control library that will give you more options than the DataGridView does. My favorite is Telerik, but there are a bunch of them. If that isn't an option, then you'll have some custom UI logic (either JavaScript or row binding code behind) that will look at the record your binding and make decisions based on the properties of the class.
As far as the 42 different classes, all with similar code, create an abstract base class with the reusable code, and derive from this class in your different logtype classes, overriding the base functionality where needed.
Use interfaces to separate functionality that must be implemented by the logtype, and implement those interfaces. That way when you are iterating through a list of these classes, you know what functionality will be implemented based interface.
It sounds like you would greatly benefit from using interfaces to separate contract from implementation, and code to the contract to decouple your classes.
Hope this helps.
The only thing that pops out at me is this
I have a LogFile struct
Are you actually getting a benefit from it being a struct that outway the potential pitfalls?
From the guidelines
CONSIDER defining a struct instead of a class if instances of the type
are small and commonly short-lived or
are commonly embedded in other
objects.
DO NOT define a struct unless the type has all of the following
characteristics:
It logically represents a single value, similar to primitive types
(int, double, etc.).
It has an instance size under 16 bytes.
It is immutable.
It will not have to be boxed frequently.
I have a table in my database called "OrderItemType" which has about 5 records for the different OrderItemTypes in my system. Each OrderItem contains an OrderItemType, and this gives me referential integrity. In my middletier code, I also have an enum which matches the values in this table so that I can have business logic for the different types.
My dev manager says he hates it when people do this, and I am not exactly sure why. Is there a better practice I should be following?
I do this all the time and I see nothing wrong with this. The fact of the matter is, there are values that are special to your application and your code needs to react differently to those values. Would your manager rather you hard-code an Int or a GUID to identify the Type? Or would he rather you derive a special object from OrderItem for each different Type in the database? Both of those suck much worse than an enum.
I don't see any problem in having enum values stored in the database, this actually prevents your code from dealing with invalid code types. After I started doing this I started to have fewer problems, actually. Does your manager offer any rationale for his hatred?
We do this, too. In our database we have an Int column that we map to an Enum value in the code.
If you have a real business concern for each of the specific types, then I would keep the enum and ditch it in the database.
The reason behind this approach is simple:
Every time you add an OrderType, you're going to have to add business logic for it. So that justifies it being in your business domain somewhere (whether its an enum or not). However, in this case having it in the database doesn't do anything for you.
I have seen this done for performance reasons but I think that using a caching mechanism would be perferable in most cases.
One alternative to help with the synchronization of the database values and the business logic enum values would be to use the EnumBuilder class to dynamically generate a .dll containing the current enum values from the database. Your business logic could then reference it, and have intellisense-supported synchonized enum values.
It's actually much less complicated than it sounds.
Here's a link to MSDN to explain how to dynamically build the enum.
http://msdn.microsoft.com/en-us/library/system.reflection.emit.enumbuilder.aspx
You just have to sub in the database access code to grab the enum values:
One more vote for you, I also use mapping database int <-> application enum, in addition, I usually describe my enums like this:
public enum Operation
{
[Description("Add item")]
AddItem = 0,
[Description("Remove item")]
RemoveItem = 1
}
which leaves me absolutely free to add new values without need to change database and with a very short workaround I can work i.e. with lists containing descriptions (that are very strongly tied to values!) - just a little bit of reflection reaches the goal!
In code, you can typically just add a property like this:
public class Order
{
public int OrderTypeInt;
public OrderTypeEnum OrderType
{
get { return (OrderTypeEnum)OrderTypeInt; }
set { OrderTypeInt = (int)value; }
}
}
How do you think, is it a good idea to have such an enum:
enum AvailableSpace {
Percent10,
Percent20,
SqF500,
SqF600
}
The question is about the semantics of the values names, i.e. both percentage and square feet. I really believe that it's not a good idea, but I could not find and guidelines, etc. in support of this.
EDIT: This will be used to determine a state of an entity - i.e. as a read only property to describe a state of an object. If we know the total space (i.e. the object itself knows it), we have the option to convert internally, so we either have only percentage, or square feet, or both. The argument is, that "both" is not a good idea.
The above is an example of course, but the real problem is that some data providers send us totals (sq.f.), and others percentage, and my goal is to unify the UI. I'm free to make some approximations, so the exact values will be adapted based on how accurate we want to present the information.
The question is only about the semantics of the value names, not the content - i.e. if it is a good idea to put percentage in an (potential) int enum.
The answer: No, it is not a good idea to use enums for representing values. Especially values in two semantically distinct scales. You should not use enums for values.
The reason: What's the relation between the enum values from the two scales, like Percent10 and SqF600? How do you expand the list of values you can represent within your code? How do you do comparison and arithmetic operations on these values?
The suggestion (not asked for, but nevertheless here it is. :-)): The semantic of what you are trying to do would be better reflected by a struct that contains two fields - one for absolute area and one for percentage available of that absolute area. With such structure you can represent anything you can represent with the enums above. For example, data providers that give you absolute area, are represented with a struct with the area and 100% available. Data providers that give you percentage, are represented with a struct with the percentage they set and the absolute area such that the percentage of that area is the actual available area the data provider wants to report. You get "normalized" representation of the data from both type of providers and you can add couple of operators to enable comparison and arithmetic calculations with instances.
If at all possible, I'd rather break your example into two values, where your enum is "Percent" and "SquareFeet" and the second value is the quantifier. Tie these together in a struct.
If the context allows for it, it may be even better to create two wrapper types "Percent" and "SquareFeet" and then create some operator overloads, so you can do things like "new SquareFeet(500) + new Percent(20);" and eliminate the use of enums.
Update: Your naming scheme would be appropriate if the values were industry recognized terms, almost to the point of being symbols. For example, it's safe to have an enum that contains values such as "ISO9001" rather than two values (an enum containing "ISO" and an int of 9001). It'd also be appropriate to have an enum like below:
public enum OperatingSystem
{
Windows95,
Windows98,
Windows2000,
WindowsXP,
WindowsVista,
MacOSClassic,
MacOSXTiger,
MacOSXLeopard
}
If the terms "Percentage10" and "Sqf500" are not terms of art or well defined in a spec, data dictionary, etc. then it's inappropriate to use them as values in an enum.