I am working on optimizing some code I have been assigned from a previous employee's code base. Beyond the fact that the code is pretty well "spaghettified" I did run into an issue where I'm not sure how to optimize properly.
The below snippet is not an exact replication, but should detail the question fairly well.
He is taking one DataTable from an Excel spreasheet and placing rows into a consistantly formatted DataTable which later updates the database. This seems logical to me, however, the way he is copying data seems convoluted, and is a royal pain to modify, maintain or add new formats.
Here is what I'm seeing:
private void VendorFormatOne()
{
//dtSumbit is declared with it's column schema elsewhere
for (int i = 0; i < dtFromExcelFile.Rows.Count; i++)
{
dtSubmit.Rows.Add(i);
dtSubmit.Rows[i]["reference_no"] = dtFromExcelFile.Rows[i]["VENDOR REF"];
dtSubmit.Rows[i]["customer_name"] = dtFromExcelFile.Rows[i]["END USER ID"];
//etc etc etc
}
}
To me this is completely overkill for mapping columns to a different schema, but I can't think of a way to do this more gracefully. In the actual solution, there are about 20 of these methods, all using different formats from dtFromExcelFile and the column list is much longer. The column schema of dtSubmit remains the same across the board.
I am looking for a way to avoid having to manually map these columns every time the company needs to load a new file from a vendor. Is there a way to do this more efficiently? I'm sure I'm overlooking something here, but did not find any relevant answers on SO or elsewhere.
This might be overkill, but you could define an XML file that describes which Excel column maps to which database field, then input that along with each new Excel file. You'd want to whip up a class or two for parsing and consuming that file, and perhaps another class for validating the Excel file against the XML file.
Depending on the size of your organization, this may give you the added bonus of being able to offload that tedious mapping to someone less skilled. However, it is quite a bit of setup work and if this happens only sparingly, you might not get a significant return on investment for creating so much infrastructure.
Alternatively, if you're using MS SQL Server, this is basically what SSIS is built for, though in my experience, most programmers find SSIS quite tedious.
I had originally intended this just as a comment but ran out of space. It's in reply to Micah's answer and your first comment therein.
The biggest problem here is the amount of XML mapping would equal that of the manual mapping in code
Consider building a small tool that, given an Excel file with two
columns, produces the XML mapping file. Now you can offload the
mapping work to the vendor, or an intern, or indeed anyone who has a
copy of the requirement doc for a particular vendor project.
Since the file is then loaded at runtime in your import app or
whatever, you can change the mappings without having to redeploy the
app.
Having used exactly this kind of system many, many times in the past,
I can tell you this: you will be very glad you took the time to do
it - especially the first time you get a call right after deployment
along the lines of "oops, we need to add a new column to the data
we've given you, and we realised that we've misspelled the 19th
column by the way."
About the only thing that can perhaps go wrong is data type
conversions, but you can build that into the mapping file (type
from/to) and generalise your import routine to perform the
conversions for you.
Just my 2c.
A while ago I ran into similar problem where I had over 400 columns from 30 odd tables to be mapped to about 60 in the actual table in the database. I had the same dilemma whether to go with a schema or write something custom.
There was so much duplication that I ended up writing a simple helper class with a couple of overridden methods that basically took in a column name from import table and spit out the database column name. Also, for column names, I built a separate class of the format
public static class ColumnName
{
public const string FirstName = "FirstName";
public const string LastName = "LastName";
...
}
Same thing goes for TableNames as well.
This made it much simpler to maintain table names and column names. Also, this handled duplicate columns across different tables really well avoiding duplicate code.
Related
my program (console app) is basically reading a very big csv file and process it. there are columns in the file that I feel like can be grouped together and bestserved in class
for example the first line is title, second line onward are the values. each column has this structure. so I need to group title, location of the column and values. easiest is to create a class
this is what the data look like:
title1, title2, title3, ...
1,1,2, ...
20,30,5000,...
.
.
.
class tt
{
string title;
int column;
List<int> val = new List<int>();
}
but the problem is there are some 1,000 columns , which translate to 1,000 objects. is this a good approach? not sure?
A class with 1000 members would sound.... unusual, to be blunt. Since it is unlikely that the code is going to be referring to any of those by name, I would say it would be self-defeating to create members per-value. But for "should I create a class" - well, you don't have many options - it certainly would make a very bad struct. Actually, I suspect this may be a fair scenario for DataTable - it is not something I usually recommend, but for the data you are describing, it will probably do the job fine. Yes, it has overheads - but it optimises away a number of issues - for example, by storing data internally in typed columns (rather than rows, as you might expect), it avoids having to box all the values during storage (although they still tend to get boxed during access, those boxes are collected during gen-0, so are cheap).
"Thousands" are low numbers in most computing scenarios.
Note: if you actually mean 1000 rows, and the columns are actually sane (say, less than 50, all meaningful), then I would say "sure, create a class that maps the data" - for example something like:
class Customer {
public int Id {get;set;}
public string Name {get;set;}
//...
}
There are several tools that can help you read this from SCV into the objects, typically into a List<Customer> or IEnumerable<Customer>.
but the problem is there are some 1,000 columns , which translate to 1,000 objects. is this a good approach
yes translating to 1000 objects (if you need them) should not bother you.
but, for so many columns creating classes is hell lot of work. I won't do that unless it is absolutely necessary. You can use DataTable.
If your file is a big CSV file I suggest that load them on a DataSet. I dont see any benefit in using a class for each column
The CSV looks like a transposed regular data table, therefore each column is actully a row
I have metadata in a DB table that i want to use in code.
The metadata is different sorts of Time types for reporting spent time.
The data can be:
NormalTime
OverTime
Vacation
Illness
etc
The data have a ID and a description and some other stuff.
ID = 1
Name = "Regular time"
Description = "Normal work time"
What is a good way to relate to this data in my code?
If for example i want create a method that sums all the NormalTime reported (i have another table that stores used time where the NormalTime ID and amount and some other stuff) how do i do that?
I dont want to hardcode the ID:
Select * from xyz where TimeType = 1
What i wanna do is:
Select * from xyz where TimeType = NormalTime.
Otherwise the code becomes very hard to read.
In my current solution i have hardcoded string consts that correlates to the ID.
The problem with this is if someone changes the description of the TimeType from NormalTime to something eles the hardcoded string const sais one thing and the db data sais something else.
And yes, this has happend as i dont have control over the DB content :(
So, how do I solve this in the best maintainable and readable way where changes can occur in the DB table and the code dont get very hard to read.
Where someone can add TimeTypes to the DB and later I can add methods that uses them in code.
One way to do this would be to use Visual Studio's T4 text generation templates.
(Entity Framework uses these for its code generation)
You can create a template file which contains code to pull the tables with metadata
from the database, and generates classes with static constants in.
They do need to be run whenever the database changes, though. But I think you might be able
to set them up so they do re-generate every time your code is built.
A question about T4 templates
You could have an enum type on the C# side that maps to a table in the database.
http://www.codeproject.com/Articles/41746/Mapping-NET-Enumerations-to-the-Database
I'm going to be creating competitions on the current site I'm working on. Each competition is not going to be the same and may have a varying number of input fields that a user must enter to be part of the competition eg.
Competition 1 might just require a firstname
Competition 2 might require a firstname, lastname and email address.
I will also be building a tool to observe these entries so that I can look at each individual entry.
My question is what is the best way to store an arbitrary number of fields? I was thinking of two options, one being to write each entry to a CSV file containing all the entries of the competition, the other being to have a db table with a varchar field in the database that just stores an entire entry as text. Both of these methods seem messy, is there any common practice for this sort of task?
I could in theory create a db table with a column for every possible field, but it won't work when the competition has specific requirements such as "Tell us in 100 words why..." or "Enter your 5 favourite things that.."
ANSWERED:
I have decided to use the method described below where there are multiple generic columns that can be utilized for different purposes per competition.
Initially I was going to use EAV, and I still think it might be slightly more appropriate for this specific scenario. But it is generally recommended against because of it's poor scalability and complicated querying, and I wouldn't want to get into a habit of using it. Both answers worked absolutely fine in my tests.
I think you are right to be cautious about EAV as it will make your code a bit more complex, and it will be a bit more difficult to do ad-hoc queries against the table.
I've seen many enterprise apps simply adopt something like the following schema -
t_Comp_Data
-----------
CompId
Name
Surname
Email
Field1
Field2
Field3
...
Fieldn
In this instance, the generic fields (Field1 etc) mean different things for the different competitions. For ease of querying, you might create a different view for each competition, with the proper field names aliased in.
I'm usually hesitant to use it, but this looks like a good situation for the Entity-attribute-value model if you use a database.
Basically, you have a CompetitionEntry (entity) table with the standard fields which make up every entry (Competition_id, maybe dates, etc), and then a CompetitionEntryAttribute table with CompetitionEntry_id, Attribute and Value.You probably also want another table with template attributes for each competition for creating new entries.
Unfortunately you will only be able to store one datatype, which will likely have to be a large nvarchar.
Another disadvantage is the difficulty to query against EAV databases.
Another option is to create one table per competition (possibly in code as part of the competition creation), but depending on the number of competitions this may be impractcal.
Suppose i have one table that holds Blogs.
The schema looks like :
ID (int)| Title (varchar 50) | Value (longtext) | Images (longtext)| ....
In the field Images i store an XML Serialized List of images that are associated with the blog.
Should i use another table for this purpose?
Yes, you should put the images in another table. Having several values in the same field indicates denormalized data and makes it hard to work with the database.
As with all rules, there are exceptions where it makes sense to put XML with multiple values in one field in the database. The first rule is that:
The data should always read/written together. No need to read or update just one of the values.
If that is fulfilled, there can be a number of reasons to put the data together in one field:
Storage efficiency, if space has proved to be a problem.
Retrieval efficiency, if performance has proved to be a problem.
Schema flexilibity; where one XML field can eliminate tens or hundreds of different tables.
I would certainly use another table. If you use XML, what happens when you need to go through and update the references to all images? (Would you just rather do an Update blog_images Set ..., or parse through the XML for each row, make the update, then re-generate the updated XML for each?
Well, it is a bit "inner platform", but it will work. A separate table would allow better image querying, although on some RDBMS platforms this could also be achieved via an XML-type column and SQL/XML.
If this data only has to be opaque storage, then maybe. However, keep in mind you'll generally have to bring back the entire XML to the app-tier to do anything interesting with it (or: depending on platform, use SQL/XML, but I advise against this, as the DB isn't the place to do such processing in most cases).
My advice in all other cases: separate table.
That depends on whether you'd need to query on the actual image data itself. If you see a possible need to query on certain images, or images with certain attributes, then it would probably be best to store that image data in a different way.
Otherwise, leave it the way it is.
But remember, only include the fields in your SELECT when you need them.
Should i use another table for this purpose?
Not necessarily. You just have to ensure that you are not selecting the images field in your queries when you don't need it. But if you wanted to denormalize your schema you could use another table and when you need the images perform a join.
I’m currently working on a project where we need to archive and trace all the modified data’s.
When a modification surrender, we have to kept theses information
Who has modified the data?
When?
And … that’s why I’m asking this question: Keep the previous
and the new value of the data.
Quickly, I have to trace every modification for every data.
Example :
I have a name field why the value “Morgan”.
When I modify this value, I have to be able to say to the user that the 6th of January, by XXX, the value changed from “Morgan” to “Robert” …
I have to find a clean and generic method to do this because a large amount of data is concerned by this behavior.
My program is in C# (.NET 4) and we are using Sql Server 2008 R2 and NHibernate for the object mapping.
Do you any ideas, experience or solution about how to do a thing like that?
I am a little confused about at what point you want to have the old vs new data available. But, this can be done within a database trigger as in the following question:
trigger-insert-old-values-values-that-was-updated
NHibernate Envers its what you want :)
You must use NHibernate 3.2+ (3.2 is the current release).
Its easy like
enversConf.Audit<Person>();
You can get info here and here
I've been in the same situation as you. I ended up doing in this way:
Save an ActivityEntry in the database containing an identity column (if you have multiple objects that change), an action-indicator (could be "User changed firstname", as a int), date field, userId and most important a parameter field.
Combining the values from the parameter field and the action-indicator I'm able to make strings like "{0} changed {1}'s firstname from {2} to {3}" where my parameter values could be "John;Joe".
I know it feels kinda wrong saving these totally loosely typed values in the database, but I believe it's the only way around, without having a copy of each table.