I am working with some massive datasets and I am using the MySQLBulkLoader to import the data into my database. However I am having issues with the MySQLBulkLoader overwriting my timestamp column which is auto-updated. To contextualize assume I have the following structure:
Timestamp | Name | Age | Height
--------------------------------
2011-01-01 | Jeff | 30 | 183
2012-02-03 | Bob | 55 | 165
2016-04-05 | Sue | 33 | 155.
The data file I am loading in with the MySQLBulkLoader is a CSV which only contains the last three columns as shown below:
Name,Age,Height
Jeff,30,183
Bob,55,165
Sue,33,155
The issues I am having is that the first column in the CSV writes into the first column of the database. So I need to be able to:
Ignore the first column or bulk insert, or
Bulk insert into a subset of columns
Thanks, your assistance is appreciated.
Just to provide the answer to the question. The My SQL function:
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number LINES]
[(col_name_or_user_var,...)]
[SET col_name = expr,...]
Allows for the loading of the data into specific columns by specifying the column names. I struggled to get this operation working by manually creating the query but was able to modify the MySQLBulkLoader function. By default that function does not allow for the setting of the columns as the columns list is read only. But I modified the function to allow for the setting of column names. Once the modification was made the system write the data into the correct columns and the timestamp was maintained.
Related
Here is the scenario:
Config Table:
+--------+-----------+-------+
| Prefix | Separator | Seed |
+--------+-----------+-------+
| A | # | 10000 |
+--------+-----------+-------+
Transaction Table:
+----+----------+------+
| Id | SerialNo | Col3 |
+----+----------+------+
| 1 | A#10000 | |
| 2 | A#10001 | |
+----+----------+------+
The Transaction table has a SerialNo column that has a sequential number generated based on configuration table. Configuration table determines the prefix separator and the seed value of the serial number.
In the above example the serial number would start at A#10000 and increment by 1.
But if after few months someone updates the configuration table to have
+--------+-----------+-------+
| Prefix | Separator | Seed |
+--------+-----------+-------+
| B | # | 10000 |
+--------+-----------+-------+
Then the Transaction table is supposed to look something like this:
+----+----------+------+
| Id | SerialNo | Col3 |
+----+----------+------+
| 1 | A#13000 | |
| 2 | B#10001 | |
+----+----------+------+
However there could be no duplicate serial numbers at any given point in time in Transaction table.
If someone sets Prefix back to A and seed to 10000 then the next serial number should not be A#10000 because it already exists. It should be A#13001
One could simply write a select query with MAX() and CONCAT() by then it could cause issues with concurrency. Don't want to have duplicate serial numbers. Also, would want to have this as performance friendly as possible.
Another solution that I could come up with is that I create a windows service that will keep on running and watching the table. The records get inserted with null as serial number and the windows service will update the serial number. This way there will be no concurrency issues but then I am not sure how reliable this is. There will be delays.
There will only be one entry in configuration table at any given point in time.
You can solve the seed value problem quite easily in SQL Server. When someone updates the seed value back to 10000 you will need to do this via a stored procedure. The stored procedure then determines what the actual next available value should be because clearly 10000 could be the wrong value. The stored procedure then executes DBCC CHECKIDENT with the correct "new_reseed_value". Then when new records are inserted the server will handle the values again correctly.
Please look at this link for usage on the DBCC CHECKIDENT command. SQL Server DBCC CHECKIDENT
I was wondering if there was a way to qualify the table mappings when using sqlbulkcopy in c#?
Currently, I have a table that contains Stock Codes and then columns associated with range of weekly bucks.
example:
Stock Code | 11-2013 | 12-2013| 13-2013 | 14-2013 etc etc.
I have a query that returns quantities for the given stock code and the week number in which they occurred.
example:
part a | 20 | 11-2013
part b | 10 | 14-2013
Ideally, there would be a way to set the columnmappings.add method and specify that I would like to map the date column of the table to the resulting date in the return row of the query. I would show what I have; however, I have no idea if this is even possible. Any suggestions or alternative ideas would be great.
Thanks
Not directly possible. Your source data has to match to your destination data. The SqlBulkCopy class isn't going to do that for you.
Create a sql query from your source data that matches the table schema of your destination table. Then you can use the SqlBulkCopy class.
I have 3 tables simplified to the below:
main_table
id | attribute1_id | attribute2_id | price
attribute1_table
id | attribute_name
attribute2_table
id | attribute_name
I've created a view in Sql Server that joins the tables together to give me the following output:
main_table
id | attribute1_id | attribute2_id | attribute1_name | attribute2_name | price
The problem I have is I want to be able to show the data in a DataGridView and allow the price to be editable. But I've created a "view" which I take it this is not the correct thing to use (i.e it's called a "view" which doesn't sound editable?)
I know I could create my own script to go through and update only the "main_table" but I think there must be a way to use DataGridView / Linked datasets with joined tables?
-
Best thing I can advise is to create a stored procedure that takes all of the parameters and then within that procedure do the individual update statements to the table. Should work fairly well.
I am wondering if it is possible to have a table in C# which allows one of the columns to appear as a row so each record is effectively two rows.
I am attempting to create a search engine for documents. I would like document properties such as document title, date created to be put into columns and also have an extract of the document in a column as well however I feel it would be more appropriate if the extract was on a new line similar to how google displays results with a page extract. I will be grateful for any advice on how this could be achieved. I am currently considering creating a jQuery component and loading it in this way unless there are any easier methods? Below is a depiction of how I imagine the table to look:
-----------------------------------------------
|Col 1 | col 2 | col3 |
-----------------------------------------------
|Data | data | data |
|Contents of col4 |
-----------------------------------------------
|Data | data | data |
|Contents of col4 |
-----------------------------------------------
Store the "extract" in it's own column in the database (each row is one "document"). Then, in your view, you can have it be displayed as it's own row in the HTML table. No need for jQuery.
I'm working on a local city project and have some questions on efficiently creating relationships between "parks" and "activities" in Microsoft SQL 2000. We are using ASP.NET C# to
I have my two tables "Parks" and "Activities." I have also created a lookup table with the proper relationships set on the primary keys of both "Parks" and "Activities." My lookup table is called "ParksActitivies."
We have about 30 activities that we can associate with each park. An intern is going to be managing the website, and the activities will be evaluated every 6 months.
So far I have created an admin tool that allows you to add/edit/delete each park. Adding a park is simple. The data is new, so I simply allow them to edit the park details, and associate "Activities" dynamically pulled from the database. This was done in a repeater control.
Editing works, but I don't feel that its as efficient as it could be. Saving the main park details is no problem, as I simply call Save() on the park instance that I created. However, to remove the stale records in the lookup table I simply DELETE FROM ParksActitivies WHERE ParkID = #ParkID" and then INSERT a record for each of the checked activities.
For my ID column on the lookup table, I have an incrementing integer value, which after quite a bit of testing has got into the thousands. While this does work, I feel that there has to be a better way to update the lookup table.
Can anyone offer some insight on how I may improve this? I am currently using stored procedures, but I'm not the best at very complex statements.
[ParkID | ParkName | Latitude | Longitude ]
1 | Freemont | -116.34 | 35.32
2 | Jackson | -116.78 | 34.2
[ActivityID | ActivityName | Description ]
1 | Picnic | Blah
2 | Dancing | Blah
3 | Water Polo | Blah
[ID | ParkID | ActivityID ]
1 | 1 | 2
2 | 2 | 1
3 | 2 | 2
4 | 2 | 3
I would prefer to learn how to do it a more universal way as opposed to using Linq-To-SQL or ADO.NET.
would prefer to learn how to do it a more universal way as opposed to using LINQ2SQL or ADO.NET.
You're obviously using ADO.NET Core :). And that's fine I think you should stick to using Stored procedures and DbCommands and such...
If you were using MSSQL 2008 you'd be able to do this using TableValued parameters and the MERGE statement. since you're using MSSQL 200 (why?) what you'd need to do is the following:
1. Send a comma delimited list of the Activity ids (the new ones) along with the ParkId to your stored proc. The ActivityIds parameter would be a varchar(50) for example.
In your stored proc you can split the ids
The strategy would be something like
1. For the Ids passed in, delete records that don't match
The SQL for that would be
DELETE FROM ParkActivities
WHERE ActivityId NOT IN (Some List of Ids)
WHERE ParkId = #ParkId
Since your list is a string you can do it like this
EXEC('DELETE FROM ParkActivities WHERE ActivityId NOT IN (' + #ActivityIds + ') AND ParkId = ' + #ParkId)
Now you can insert those activities that are not already in the table. The simplest way to do this would be to insert the ParkActivity ids into a temp table. To do that you'll need to split the comma delimited list into individual ids and insert them into a temp table. Once you have the data in the temp table you can insert doing a join.
The is a built-in user defined function in MSSQL 2000 that can do the split and return a Table Variable with each value on a seperate row.
http://msdn.microsoft.com/en-us/library/Aa496058
What is wrong with LinqToSQL and ADO.NET? I mean, could you specify your doubts about using those technologies
update
if LinqToSQL is not supported for 2000, you can easily upgrade to free 2008 express. It would be definitely enough for purposes you described.