Best way to represent text in memory

Best way to represent text in memory - c#

I have 2 classes that inherit from a common base class.
Each of these specialized classes load some data from a data base, process it and then save that information in text files.
The first class represents the information as a XML document.
The second class will store its information as a text file with delimiters separating fields.
What I want to do is to write a single Save method in the base class that can be used by both classes. As all classes will write to text files I was thinking in use a common representation to store their data in memory - for instance the first class will transform the XmlDocument to that common representation.
What is the best way to store this in memory, string, Stream?
Thanks

If the derived classes represent the data very differently, don't implement a common Save method for them, Those classes knows best how to Save their data.
Make Save() abstract and have each of the subclass implement the saving.
There might be something in common for doing a Save() (e.g. opening the actual file, error handling). So have your base class provide a Save() method that's responsible for that which in turn calls a virtual Save(System.IO.TextWriter writer); method that each of your subclasses implement.

Given that XML is the richer of the two formats you mention, and relatively easy to manipulate, why not have a single representation and 2 save methods?

If the input data is uniformly structured, you can likely store it cleanly in a DataSet, and maybe load directly from your XML using DataSet.ReadXml on a TextReader input.
If you only have one type of record to output to the delimited textfile, DataTable could be used directly - a DataSet encapsulates multiple DataTables.
Another alternative might be to convert XML directly to CSV (comma = delimiter here, you could use whatever you wanted though) using XSLT as shown here by #Welbog.

Related

C# custom file parsing with 2 delimiters and different record types

I have a (not quite valid) CSV file that contains rows of multiple types. Any record could be one of about 6 different types and each type has a different number of properties. The first part of any row contains the timestamp and the type of record, followed by a standard CSV of the data.
Example
1456057920 PERSON, Ted Danson, 123 Fake Street, 555-123-3214, blah
1476195120 PLACE, Detroit, Michigan, 12345
1440581532 THING, Bucket, Has holes, Not a good bucket
And to make matters more complex, I need to be able to do different things with the records depending on certain criteria. So a PERSON type can be automatically inserted into a DB without user input, but a THING type would be displayed on screen for the user to review and approve before adding to DB and continuing the parse, etc.
Normally, I would use a library like CsvHelper to map the records to a type, but in this case since the types could be different, and the first part uses a space instead of comma, I dont know how to do that with a standard CSV library. So currently how I am doing it each loop is:
String split based off comma.
Split the first array item by the space.
Use a switch statement to determine the type and create the object.
Put that object into a List of type object.
Get confused as to where to go now because i now have a list of various types and will have to use yet another switch or if to determine the next parts.
I don't really know for sure if I will actually need that List but I have a feeling the user will want the ability to manually flip through records in the file.
By this point, this is starting to make for very long, confusing code, and my gut feeling tells me there has to be a cleaner way to do this. I thought maybe using Type.GetType(string) would help simplify the code some, but this seems like it might be terribly inefficient in a loop with 10k+ records and might make things even more confusing. I then thought maybe making some interfaces might help, but I'm not the greatest at using interfaces in this context and I seem to end up in about this same situation.
So what would be a more manageable way to parse this file? Are there any C# parsing libraries out there that would be able to handle something like this?

You can implement an IRecord interface that has a Timestamp property and a Process method (perhaps others as well).
Then, implement concrete types for each type of record.
Use a switch statement to determine the type and create and populate the correct concrete type.
Place each object in a List
After that you can do whatever you need. Some examples:
Loop through each item and call Process() to handle it.
Use linq .OfType<{concrete type}> to segment the list. (Warning with 10k
records, this would be slow since it would traverse the entire list for each concrete type.)
Use an overridden ToString method to give a single text representation of the IRecord
If using WPF, you can define a datatype template for each concrete type, bind an ItemsControl derivative to a collection of IRecords and your "detail" display (e.g. ListItem or separate ContentControl) will automagically display the item using the correct DataTemplate

Continuing in my comment - well that depends. What u described is actually pretty good for starters, u can of course expand it to a series of factories one for each object type - so that you move from explicit switch into searching for first factory that can parse a line. Might prove useful if u are looking to adding more object types in the future - you just add then another factory for new kind of object. Up to you if these objects should share a common interface. Interface is used generally to define a a behavior, so it doesn't seem so. Maybe you should rather just a Dictionary? You need to ask urself if you actually need strongly typed objects here? Maybe what you need is a simple class with ObjectType property and Dictionary of properties with some helper methods for easy typed properties access like GetBool, GetInt or generic Get?

Software structure advice

I have posted a question on here previously asking similar advise, but this project has evolved significantly, so I would like to ask for advice on how the experts would tackle this problem.
First, I will describe what the problem is, then how I have currently looked at it. Please, I want to learn - so do critise my approach/tell me what I can/should do better!
Requirements:
I have a log file decoder. I have three different systems generating log files. Each system is slightly different. There are seven different types of log files. Each log file can be in either ASCII format (human readable) or binary format (not human readable). So there are a lot of different logs - but many are similar. For example, for most, the binary and ascii is the same info in a different form.
There is also one log type which is in a totally different structure, i.e., if a, b and c are different values - each stored 6 times, most logs are type 1. One log is type 2.
type 1: abcabcabcabcabcabc
type 2: aaaaaabbbbbbcccccc
On top of this, each system has a status register. The three systems are all different in this respect. i.e. 7 * 8 bit registers, 3 * 32 bit registers... These need processing after the log is decoded (for the logs that contain the info) and then a chart needs to be plotted for other info (where required).
So, my solution so far:
I have a LogFile struct. This contains a DataTable to contain all the data. Also contains a few strings, such as serial numbers which are read from the log files and some Enums (log type, system type, encoding format)
I have a Parser class. This has some static methods : to Identify what logs are contained within a log file (An ASCII file can contain several different ones - the GUI will find out what is in there, ask the user which one they want and then decode it. Another static method to act as a factory and give back an instantiation of the Parser class - there are 3 types. One generic. One for binary of the (type 2, above) and one for ascii of the (type 2, above).
I have a SystemType class. This contains info such as status register meanings, log structures for each type. I.e. when decoding a type, the GUI will call the 'GetTable, which will give back a DataTable with columns of the fields to read from the file. The Parser can then just cycle through the columns, which will allow it to know what type of variable to read from the file (Int, Single, String, etc).
I have a Reader class. This is abstract and has two child classes - one for ascii, one for binary. So, I can call reader.ReadInt and it will handle appropriately.
There is also a class to generate charts and decode the status register. Status registers are just an array of array of strings, giving name and description of each bit. Perhaps this could be a struct - but does it make a difference? There is also a further class which analyses 3 values in one particular log and if they are present, will insert a column with a value calculated from them (they are strings).
The whole things just isn't very flexible, but I didn't want to write a different class for each of (3*7*2 =) 42 log types! They are too similar, yet different so I think that they would have resulted in a lot of duplicate code. This is why I came up with the idea of the DataTable and a generic Parser.
So, sorry for the long text!
I have a few other questions - I have used a DataTable for the data because I use a DataGridView in the GUI to display all of this to the user. I assumed this would simplify this, but is there a better way of doing this? When I bind the DataTable to the DataGridView, I have to go through each one looking for a particular row to highlight, adding tooltips and setting various column widths, which actually takes as long as the whole decoding process. So if there is a more efficient way of doing this, it would be great!
Thanks for any feedback!! Please, I can not have too much advice here as I have been playing around, rearranging for ages trying to get it in a way that I think is a nice solution, but it always seems clunky and very tightly coupled, espcially with the GUI.

You probably want a class instead of a struct.
I wouldn't use a DataTable unless I had to. I would instead use a List or something similar, you can still bind this to your DataGridView. For formatting the grid, if this is an option, buy a UI control library that will give you more options than the DataGridView does. My favorite is Telerik, but there are a bunch of them. If that isn't an option, then you'll have some custom UI logic (either JavaScript or row binding code behind) that will look at the record your binding and make decisions based on the properties of the class.
As far as the 42 different classes, all with similar code, create an abstract base class with the reusable code, and derive from this class in your different logtype classes, overriding the base functionality where needed.
Use interfaces to separate functionality that must be implemented by the logtype, and implement those interfaces. That way when you are iterating through a list of these classes, you know what functionality will be implemented based interface.
It sounds like you would greatly benefit from using interfaces to separate contract from implementation, and code to the contract to decouple your classes.
Hope this helps.

The only thing that pops out at me is this
I have a LogFile struct
Are you actually getting a benefit from it being a struct that outway the potential pitfalls?
From the guidelines
CONSIDER defining a struct instead of a class if instances of the type
are small and commonly short-lived or
are commonly embedded in other
objects.
DO NOT define a struct unless the type has all of the following
characteristics:
It logically represents a single value, similar to primitive types
(int, double, etc.).
It has an instance size under 16 bytes.
It is immutable.
It will not have to be boxed frequently.

Should I have seperate objects for Input and Output, does this pattern have a name?

I am looking for the name of this pattern and to understand the reasons for this design.
I have:
Input data(from DB): being put in an
object, containing data
corresponding to a table row and
also the objects of it's foreign
keys. Data could come from DB or
generated from code for unit
testing.
A calculator: that analyses,
processes and modify this data. I
want the input data to stay external
to this calculator to be able to
unit test it with data generated
from code or coming from and XML
file instead of a DB.
Results data: Then I need to update
my database, BUT also those results
can be used to validate against
expected results in the case of unit
testing.
Is this a Separation of Concern Pattern? Is there other known patterns involved here?
Should my result object, containing result data, sit in the same object as my input data and null at the begining then updated after the processing, or should the results object be totally independent from the input object, and why please.

I would suggest that your output object be completely independent from your input object. Otherwise your input object has to "know" about the output object. That seems unreasonable because any time you change the output object, you by extension change the input object. Or perhaps you want multiple different output objects. Then the input object needs to know about all of the output objects, or you have an untyped reference (i.e. object in .NET) that you'll need to cast.
The input object is exactly that: it represents the data as its received from your data store or whatever. That input data is passed through some transformation to create an output object of some kind, which might bear very little relation to the input object.
In short, there's no good reason for the input object to have any knowledge of the output object. In fact, the input object shouldn't even be concerned that the output object exists. Giving the input object that knowledge will result in unnecessary coupling, which will make it more difficult to modify your code in the future.

There are many different things in your list which seem to specialized to be part of one single pattern. The cornerstone of you describe seems to be the Repository pattern though.

At a high level, I would refer to this as "ETL", "Extract, Transform and Load". This term is more commonly used in a BI context, but it is the exact same thing conceptually.

Write DB information to a file - good appraoch?

I have a DB and a mapping framework which was written by the company Im working for. So I have a class for each table in the DB and those classes allow to call the DB and get various information in DataSets or DataTables for example. Now I am supposed to write that information to a TXT file in a certain format (I have the specs how it should look like). Its about estates, there will be many estates in a file and for each estates there are about 40 lines in the file. I know how to write to a file and stuff, what I am looking for is a good approach how to build this functionality in general.
This might be too general to give good advice, but is there a proven way to do such things?
Thanks :-)

Let's call the classes that give you table info TableInfo objects
I'd create an interface IDBInfoWriter, with a method WriteDBInfo(TableInfo ti)
Then an implementation of IDBInfoWriter, say DBInfoFileWriter, with a FileWriter as a private member. Each call to WriteDBInfo would dump whatever in the file writer
Finally a DBInfoWalker object, which would take a list of instantiated TableInfo and a IDbInfoWriter
class DBInfoWalker
function new(TableInfo[] tis, IDBInfoWriter idbiw)
{...}
function process()
{
for each TableInfo ti in tis
{
idbiw.WriteDBInfo(ti);
}
}
This way you can
Work on any subset of TableInfo you want (lets say you want just a list of TableInfo beginning with "S", pass only this list in the constructor of DBInfoWalker
Create as many output styles for your tableInfo, just create the correct implementation of IDBInfoWriter (network, unique file, multiple files, etc)
Of course, that's just one possibility :)
Good luck

Find number of serialized objects

My issue is trying to determine a number of objects created, the objects being serialized from an XML document. The XML document should be set up for simplicity, so any developer can add an additional object and need no further modification to the code. However each of these objects need to be handled/updated seperately, and specifically, some of the objects are of different sub-classes, which need to be handled differently. So what would be my simplest course of action, allowing other to add objects via the XML, but still ensuring the proper logic happenes for each?

This is totally a bad idea, but if you want something constructive...
Model your XML document objects and include some kind of known syntax for you to specify Lambda expressions in it. So if you enter a
<BinaryExpresion>
<NodeType>Add</NodeType>
<Left>3</Left>
<Right>4</Right>
</BinaryExpression>
Then when you read and compile the expression, you could run that code against the data if the XML object and do something (in this case, executing 3 + 4)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.