How I can create a column using script component in SSIS? - c#

I have the follow situation:
I need create a project in SSIS to import some datas from csv to our system but for to do this I must read some columns, and one of this columns is "group" of values.
Are values of planning horizon and this horizon can change each process, so some process can be 5 months and others 15 months.
The file (csv) will be filled with 21 columns always, but after (22, 23...) I don`t know if is 1, 2 or more columns (horizon).
And with this situation I can`t create columns in "Input and Columns" from Script Transformation Edit, I need create based on lenght of horizon.
So, my question if is possible create a column in run time, when I discovery the length of horizon.
Regards

SSIS doesn't work that way. The number of columns is set at design time.
If you can set a reasonable upper limit - say 50 columns, you can read in the last "column" of data and then parse that, via Script Component, into those fields. Otherwise, you're looking at preprocessing the file to unpivot the variable width rows into a normalized set.

You can do this in two different ways.
Add column(s) to a script component
https://msdn.microsoft.com/en-us/library/ms188192.aspx
Add a derived column transformation and add a custom column with the appropriate expression.

Tks for all answers. I changed my vision to create different.
I use script tranformation to check :
how much columns I needed create;
open a connection and delete columns of horizon;
create again columns based in new horizon;
After I included a Execute Sql Task to call a procedure that to do all logic to fill the columns.
Regards,

Related

SSIS Excel destination inserting null values on other columns

I have 2 script components which extract data from result set objects say User::AllXData and User::AllYData.
It is run through a foreach loop and the data is stored in a data table.
Next, I'm adding the data into a excel sheet using Excel destination. Now when I do that. All the data corresponding to column A (i.e, the data from User::AllXData) is being added to the excel sheet, but the column B gets filled with null values till the end of column A's data.
Then column B gets filled leaving column A with null data. It's supposed to be aligned.
Is there a workaround for this?
Edit:
After a long of grinding and running many tests, finally came across a solution.
The answer to this is pretty simple. Instead of using two objects as result set, it's better to use only one.
If you're going to query from a single source, include all the required columns in your SQL query into one object result set and use that as a read only variable in the script component.
Create a single data table that includes all the required columns and adds them into your excel destination row by row without any null values.
Here's an article that has a good example.

SSIS: Column Size Not Changing Based on Query

I have a package. It has a query that feeds into a Script Component.
In the query I am selecting a varchar(8) column from a table and then I CAST(myDateCol AS varchar(10)).
SELECT
myPK,
CAST(myDateCol AS varchar(10)), --myDateCol defined as varchar(8)
myOtherCol
FROM
MyServer.MySchema.MyTable
In my script, I am trying to add two characters to the Row.myDateCol in Input0 but I get a Buffer Error and it is in the property setter for myDateCol. You can see that it sets the property to 8 characters but errors out after that.
What I've done is add an output column with Length = 10, set it, and mapped that to the next component in the package but that seems a little silly.
Is there a way to force the size of your input columns based off of the query OR is there a way that I can manually force a refresh in case the package is just stuck thinking that I'm dealing with a varchar(8) as the CAST operation was added later?
Additional Info:
Row.myDateCol = "20170404"
Row.myDateCol = "2017-04-04" // Errors out
This is normal behavior for SSIS. When you create a data source which uses a SQL query, SSIS will look at your query and build the the metadata for the dataflow. The data source will only recalculate that metadata if you change the structure of your query, for example number columns or their names.
The easiest way to force a refresh of the data types without resorting to renaming columns is to go to the columns page of the data source editor, Untick and then tick the top tick box of the Available External Columns. This will deselect all columns and re-select them and at the same time refresh the metadata. You can easily confirm this by hovering your mouse over the External\Output column names listed in the lower section.
Your problem is the result of dealing with Date(Time) as text instead of the number(s) it is. And I really cannot tell from your question if you want to want to add the extra characters added in at the Data Layer (Sql) or at the Application (C#) Layer.
Casting VarChar(8) => VarChar(10) will still just return VarChar(8) if you don't fill in (pad) that value. You could try a Cast VarChar(8) to Char(10).
Another option would be a double conversion of your column value to Date and then back to your desired varchar(10).
SELECT myPK,
Convert(VarChar(10), Convert(Date, myDateCol, 112), 120),
myOtherCol
FROM
MyServer.MySchema.MyTable
So, after some playing around, I found that renaming the column changed the size to varchar(10) per below:
SELECT
myPK,
CAST(myDateCol AS varchar(10)) AS DATECOL,
myOtherCol
FROM
MyServer.MySchema.MyTable
I then changed it back
SELECT
myPK,
CAST(myDateCol AS varchar(10)),
myOtherCol
FROM
MyServer.MySchema.MyTable
And the change stuck. I don't know why or how but VS/SSIS somehow never refreshed itself to change to a different type. I assume it has no handling for query changes after the initial query is input unless names/aliases change.
This wasn't just my machine either. Weird.

EPPlus: How to update Pivot Table SourceRange

I'm getting stuck trying to update the SourceRange of a Pivot-Table with EPPLus inside a C# class.
I've found that CacheDefinition.SourceRange contains the DataSource of my existing Pivot-Table but I don't know how to change it.
Existing Pivot-Table datasource is a range on a data worksheet in the same Excel file.
Any advice?
Thanks in advance,
Alessandro
This might work:
You can create a self-dimensioning defined name that encompasses your data range. I use this all the time.
Open Name Manager.
Click New.
Enter a Name for your range.
Put the following in the Refers to: line
OFFSET(DataSource!$A$1,0,0,COUNTA(DataSource!$A:$A),COUNTA(DataSource!$1:$1))
syntax: OFFSET(reference, rows, cols, [height], [width])
Substitute your sheet/tab name for DataSource. This assumes that the table starts in A1 (first section) and you that you want a defined name as long as the number of values in column A and as wide as the values in row 1. It is a very flexible and useful method for making sure your defined names encompass all the data on the sheet.

Creating an ETL system (Data import and transformation)

I have been tasked to write a module for importing data into a client's system.
I thought to break the process into 4 parts:
1. Connect to the data source (SQL, Excel, Access, CSV, ActiveDirectory, Sharepoint and Oracle) - DONE
2. Get the available tables/data groups from the source - DONE
i. Get the available fields form the selected table/data group - DONE
ii. Get all data from the selected fields - DONE
3. Transform data to the user's requirements
4. Write the transformed data the the MSSQL target
I am trying to plan how to handle complex data transformations like:
Get column A from Table tblA, inner joined to column FA from table tblB, and concatenate these two with a semicolon in between.
OR
Get column C from table tblC on source where column tblC.D is not in table tblG column G on target database.
My worry is not the visual, but the representation in code of this operation.
I am NOT asking for sample code, but rather for some creative ideas.
The data transformation will not be with free text, but drag and drop objects that represent actions.
I am a bit lost, and need some fresh input.
maybe you can grab some ideas from this open source project: Rhino ETL.
See my answer: Manipulate values in a datatable?

How do I programatically verify, create, and update SQL table structure?

Scenario:
I have an application (C#) that expects a SQL database and login, which are set by a user. Once connected, it checks for the existence of several table and creates them if not found.
I'd like to expand on this by having the program be capable of adding columns to those tables if I release a new version of the program which relies upon the new columns.
Question:
What is the best way to programatically check the structure of an existing SQL table and create or update it to match an expected structure?
I am planning to iterate through the list of required columns and alter the existing table whenever it does not contain the new column. I can't help but wonder if there's an approach that is different or better.
Criteria:
Here are some of my expectations and self-imposed rules:
Newer versions of the program might no longer use certain columns, but they would be retained for data logging purposes. In other words, no columns will be removed.
Existing data in the table must be preserved, so the table cannot simply be dropped and recreated.
In all cases, newly added columns would allow null data, so the population of old records is taken care of by having default null values.
Example:
Here is a sample table (because visual examples help!):
id datetime sensor_name sensor_status x1 x2 x3 x4
1 20100513T151907 na019 OK 0.01 0.21 1.41 1.22
2 20100513T152907 na019 OK 0.02 0.23 1.45 1.52
Then, in a new version, I may want to add the column x5. The "x-columns" are all data-storage columns that accept null.
Edit:
I updated the sample table above. It is more of a log and not a parent table. So the sensors will repeatedly show up in this logging table with the values logged. A separate parent table contains the geographic and other logistical information about the sensor, making the table I wish to modify a child table.
This is a very troublesome feature that you're thinking about implementing. i would advise against it and instead consider scripting changes using a 3rd party tool such as Red Gate's Sql Compare: http://www.red-gate.com/products/SQL_Compare/index.htm
If you're in doubt, consider downloading the trial version of the software and performing a structure diff script on two databases with some non-trivial differences. You'll see from the result that the considerations for such operations are far from simple.
The other way around this type of issue is to redesign your database using the EAV model: http://en.wikipedia.org/wiki/Entity-attribute-value_model (Pivots to dynamically add rows thus never changing the structure. It has its own issues but it's very flexible.)
(To utilize a diff tool you would have to have a copy of all of your db versions and create diff scripts which would go out and get executed with new releases and upgrades. That's a huge mess of its own to maintain. EAV is the way for a thing like this. It wrongfully gets a lot of flak for not being as performant as a traditional db structure but i've used it a number of times with great success. In fact, i have an HIPAA-compliant EAV db (Sql Server 2000) that's been in production for over six years with several of the EAV tables containing tens or millions of rows and it's still going strong w/ no big slow down. Of course we don't do heavy reporting against that db. For reports we have an export that flattens the data into a relational structure.)
The common solution i see would be to store in your database somewhere version information. maybe have a really small table:
CREATE TABLE DB_PROPERTIES (key varchar(100), value varchar(100));
then you could add a row:
key | value
version | 12
Then you could just create a sql update script (or set of scripts) which updates the db from version 12 to version13.
declare v varchar(100)
select v=value from DB_PROPERTIES where key='version'
if v ='12'
#do upgrade from 12 to 13
elsif v='11'
#do upgrade from 11 to 13
...and so on
depending on what upgrade paths you wanted to support you could add more cases. You could also obviously move this upgrade logic into C# and or whatever design works for you. But having the db version information stored in the database will make it much easier to figure out what is already there, rather than querying for all the db structures individually.
If you have to build something in such a way as to rely on the application making table changes, your design is flawed. You should have a related table for the sensor values (x1, x2, etc.), then you can just add another record rather than having to create a new column.
Suggested child table structure
READINGS
ID int
Reading_type varchar (10)
Reading_Value int
Then data in the table would read:
ID Reading_type Reading_value
1 x1 2
1 x2 3
1 x3 1
2 x1 7
Try Microsoft.SqlServer.Management.Smo
These are a set of C# classes that provide an API to SQL Server database objects.
The Microsoft.SqlServer.Management.Smo.Table has a Columns Collection that will allow you to query and manipulate the columns.
Have fun.

Categories

Resources