I am developing a sophisticated data transformation script using T-SQL. The script has to convert the data between the new database scheme and the legacy scheme.
To test the script I perform the following actions:
Create or reset both database.
Make a small change in the "new" database I want to test (add a new entry, delete one entry etc.)
Run the script to sync both instances.
Look in the "old" instance whether the change was correctly propagated here.
I do everything manually and this is really a mundane work. What I would like to get is a framework or tool that would automate the steps 1,3 and 4 and would allow me to script my changes and the assertions (like in normal unit test) and run multiple tests.
I looked at the SQL Server Data Tools, but they provide very limited support for SQL unit testing. This is why I am looking for some alternatives or an extended MSTest or xTest based example of such automation.
I don't know the complexity of your tests, but, may be, rolling back the changes made during test will help you (as far I understand, the main difficulty is to make an initial state for testing)?
-- The table from old database
CREATE TABLE [dbo].[People]
(
[Id] [int] NOT NULL PRIMARY KEY,
[FullName] [nvarchar](100) NOT NULL
)
-- The table from new database
CREATE TABLE [dbo].[People](
[Id] [int] NOT NULL PRIMARY KEY,
[FirstName] [nvarchar](100) NOT NULL,
[LastName] [nvarchar](100) NULL
)
Sample test (should transfer all non-existing records into old table):
BEGIN TRAN
INSERT INTO
[Old_Database].[dbo].[People]
SELECT
New_People.[Id],
(New_People.[FirstName] + ' ' + New_People.[LastName]) AS FullName
FROM
[New_Database].[dbo].[People] AS New_People
WHERE
NOT EXISTS(SELECT [Id] FROM [Old_Database].[dbo].[People] WHERE [Old_Database].[dbo].[People].[Id] = New_People.Id)
IF (##ROWCOUNT = 0)
PRINT('Failed!');
ELSE
PRINT('Passed.');
-- We can look, what was changed
SELECT * FROM [Old_Database].[dbo].[People]
-- Do not commit the changes. This allows to run test many times
ROLLBACK TRAN
Related
I've created an application that has creeped into production, it has several tables like to one below.
I have a search query similar to below for each table. The database is growing by several thousand rows per day and I'm concerned about performance moving forward.
Can anyone suggest how I should re-engineer this process to increase efficiency?
I'm using Entity framework, C# and SQL Server.
Also is it possible to estimate system resource requirements for a database like this? Let's say for example if I had 600 000 rows?
Thanks in advance for the replies!
select top 100 *
from table
where given_name.contains(search)
or family_name.contains(search)
or session_number.contains(search)
Table structure:
[id] [int] IDENTITY(1,1) NOT NULL,
[given_name] [nvarchar](100) NULL,
[family_name] [nvarchar](100) NULL,
[session_number] [nvarchar](100) NULL,
[birth_date] [datetime2](7) NULL,
[start_date] [datetime2](7) NULL,
[reported_date] [datetime2](7) NULL,
[confirmed_date] [datetime2](7) NULL,
[dir_name] [nvarchar](100) NULL,
[info] [text] NULL,
[complete] [bit] NULL,
[approved_by] [uniqueidentifier] NULL,
[reported_by] [uniqueidentifier] NULL,
[code] [nvarchar](10) NULL,
[sex] [bit] NULL,
[emergency] [bit] NULL,
[release] [bit] NULL,
[stop] [bit] NULL,
600.000 rows is not so much rows so actually you can go on with your approach.
If it will increase there is 1 problem and one potential problem:
The query has Contains clause that EF translates into SQL like similar to this pattern Like '%%'. The optimizer will not use the index on given_name, family_name and session number. You can evaluate SQL Server full text search that is not directly supported by Ef but there are some libraries (few lines of code) to enable a support. You can find one of them here
http://www.entityframework.info/Home/FullTextSearch
The second problem is related to OR optimization if there are the right indexes (is not your case!!!). The DBMS in this case can work in 2 ways:
Table scan evaluating the WHERE CLAUSE;
Retrieve 3 different temporary tables (1 for every OR condition) than make a UNION (not a UNION ALL that is less expensive but the result is different). A UNION is a merge of tables so the DBMS could sort temporary tables if there are several records or just work on the 3 temporary tables with table scans (sort on REC_ID not on the index used to solve the OR clause)
So quite expensive approach in both cases but if you need really an OR clause is the best approach. Also, the DBMS works on index statistics so it will probably make a better choice than a programmer choice (we hope so).
If you don't mind to have duplicated records you can split the query in 3 different queries and make a sql UNION (Concat in LINQ). Take care becouse a single record that is ok for more than one OR condition will appear more times.
I think you can create a stored procedure to handle your search. Also, to avoid OR you could use Full-Text. Then use in the stored procedure using Full-Text Search like this:
CREATE PROCEDURE prc_SearchTable
#searchTerm VARCHAR(100)
-- searchTerm should be like *john*
AS
BEGIN
SELECT *
FROM theTable
WHERE CONTAINS((given_name,family_name,family_name), #search)
END
Make sure you add the wild cards for Full-Text Search * term * (without spaces).
You can add the stored procedure to EF like described here
I've searched every way I can come up with, but can't find an technique for initializing a DataTable to match a UDT Table declared in our DB. I could manually go through and add columns, but I don't want to duplicate the structure in both places. For a normal table, one option would be to simply issue a "select * where ..." that returns no results. But can something like this be done for a UDT Table?
And here is the background problem.
This DB has a sproc that accepts a Table Valued Parameter that is an instance of the indicated UDT Table declared in the same DB. Most of the UD fields are nullable, and the logic to load the TVP is quite involved. What I hoped to do is initialize the DT, then insert rows as needed and set required column/field values as I go until I'm ready to toss the result to SS for final processing.
I can certainly add the dozen or more fields in code, but the details are still in flux (and may continue to be so for some time), which is one reason I don't really want to have to load all the columns in code.
So, is there a reasonable solution, or am I barking up the wrong tree? I've already spent more time looking for the solution I expected to exist than it would have taken to write the column loading code 100 times over, but now I just want to know if it's possible.
Ok, I was discussing with a friend who is MUCH more SQL savvy than I am (doesn't take much), and he suggested the following SQL query:
"DECLARE #TVP as MyUDTTable; SELECT * FROM #TVP"
This appears to give me exactly what I want, so I'm updating here should some other poor sap want something similar in the future. Perhaps others may offer different or better answers.
Here is an example of how I did this. This style of input/output is something me and a co-worker put together to allow quick and effective use of entity framework on his side and keeps my options open to use all sql toys. If that is the same use as you have you might also like the OUTPUT use I did here. It spits the newly created ids right back at whatever method calls the proc allowing the program to go right on to the next activity withouth pestering my database for the numbers.
My Udt
CREATE TYPE [dbo].[udtOrderLineBatch] AS TABLE
(
[BrandId] [bigint] NULL,
[ProductClassId] [bigint] NULL,
[ProductStatus] [bigint] NULL,
[Quantity] [bigint] NULL
)
and the procedure that takes is as an input
create procedure [ops].[uspBackOrderlineMultipleCreate]
#parmBackOrderId int
,#UserGuid uniqueidentifier
null
,#parmOrderLineBatch as udtOrderLineBatch readonly
as
begin
insert ops.OrderLine
(
BrandId
,ProductClassId
,ProductStatusId
,BackOrderId
,OrderId
,DeliveryId
,CreatedDate
,CreatedBy)
output cast(inserted.OrderLineId as bigint) OrderLineId
select line.BrandId
,line.ProductClassId
,line.ProductStatus
,#parmBackOrderId
,null
,null
,getdate()
,#UserGuid
from #parmOrderLineBatch line
join NumberSequence seq on line.Quantity >= seq.Number
end
I may be missing something here, but I've searched for hours and I'm either not finding what I need, or I'm not searching on the correct terms. Never-the-less, this is what I'm trying to do.
I'm currently exploring migrating from EF to plain-old ADO. I'm happy that whilst there is a development hit in doing so all current testing points to ADO still being many times faster than EF (which given EF is built on ADO makes sense).
Where I am a little stumped, is generating an update statement for a table row, and an efficient one. Any update statement may change values in 1 or 10 fields, but it's clearly more efficient to only post the data that needs changing.
My question is, what is the best way to generate the update statement to as to remain protected from SQL injection?
For instance, one column value update would be
update Table1 set Column2 = 'somevalue' WHERE Column1 = #id;
Where two columns would be
update Table1 set Column2 = 'somevalue', Column 3 = 'some other value' WHERE Column1 = #id;
Does anyone have any best practises on how they handle this please?
Additional Information:
I've had this down-voted, but quite honestly I think that is because I haven't made myself clear in what I want.
Let me start be confirming that I understand I have options of straight-forward SQL commands (which I am fairly competent on) or placing the said command within a Stored Procedure and calling either from ADO. I also fully understand the importance of using parameters in any SQL statement where user input is placed.
Imagine the following table:
DECLARE #example TABLE
(
Id INT IDENTITY NOT NULL,
Name VARCHAR(50) NOT NULL,
Description VARCHAR(1000) NOT NULL
);
-- Indexes omitted for simplicity
Now imagine I have an API, allowing users to update a row in this table. The user can update either Name, Description OR both columns, simply by passing the Id. The call is completely disconnected from any "result sets" and therefore I must issue an UPDATE command to the database manually (or through a Stored Procedure).
To keep data transmission to a minimum (therefore helping to maximise performance), I want to cater for the following scenarios:
User updates just Name
UPDATE #example SET [Name] = #name WHERE [Id] = #id;
User updates just Description
UPDATE #example SET [Description] = #description WHERE [Id] = #id;
User updates both
UPDATE #example SET [Name] = #name, [Description] = #description WHERE [Id] = #id;
After all, with each call, I don't know what the caller wishes to update.
In reality, tables can have many, many columns, and it in completely ridiculous to create the relevant SQL statements for every possible combination - let alone the ludicrous effort it would require to keep updated.
What I'm looking for (as I seem to be missing in searches) is how to generate a safe SQL statement that caters for each option based on what the user supplies AND uses parameters AND generates the smallest query possible - needed because we cannot update a column value if the user did not pass a value for it.
I hope this helps to clarify the requirement better.
Parameterize ALL values in ALL cases. This will ensure you avoid SQL injection attacks. As far as patterns for tracking which fields have changed and thus need updating, that is a larger exercise with many examples available on the interwebs for your reading enjoyment.
update Table1
set Column2 = #Column2,
Column3 = #Column3
where Column1 = #Column1
I am writing a .NET application that interact with MySQL database using ODBC connection. My application will create the database schema and tables on the database upon start up. However, I encountered a weird unexplainable case below:
First my application will create the following table
CREATE TABLE IF NOT EXISTS `sample` (
`item_ID` varchar(20) NOT NULL,
`item_No` int(11) NOT NULL,
`sample_col1` varchar(20) NOT NULL,
`sample_col2` varchar(20) NOT NULL,
PRIMARY KEY (`item_ID`, `item_No`)
) ENGINE=InnoDB;
Populate the table with
INSERT INTO sample SET item_ID='abc', item_No=1, sample_col1='', sample_col2='';
INSERT INTO sample SET item_ID='abc', item_No=2, sample_col1='', sample_col2='';
Then I execute a SELECT query from within the .NET application using the following code:
Dim query As String
query = "SELECT item_ID, item_No, sample_col1, sample_col2 FROM sample"
Dim conn As New OdbcConnection("Driver={MySQL ODBC 5.1 Driver};Server=localhost; port=3306;Database=test; User=dbuser;Password=dbpassword;Option=3;")
Dim cmd As New OdbcDataAdapter(query, conn)
Dim dt As New DataTable
cmd.FillSchema(dt, SchemaType.Source)
cmd.Fill(dt)
conn.Close()
The cmd.Fill(dt) line will throw an Exception: "Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints".
However if I modify the table creation query to:
CREATE TABLE IF NOT EXISTS `sample` (
`item_ID` varchar(20) NOT NULL,
`item_No` int(11) NOT NULL,
`sample_col1` varchar(20) NOT NULL,
`sample_col2` varchar(20) NOT NULL,
PRIMARY KEY (`item_No`,`item_ID`) '--> Here Primary Key order is inverted
) ENGINE=InnoDB;
The vb.net code works perfectly. Notice on the second creation query I inverted the order of the PRIMARY KEY creation. On the first example I put item_ID first while on the second example I put item_No first.
Does anyone has any clue why this is happening? Also is there any difference from the point of view of MySQL database between the first and second create query?
Any input will be appreciated. Thank you !
I need your help :)
I have a table in a database (SQL Server 2008 R2). Currently there are around 4M rows.
Consumer apps take rows from there (lock them and process).
To protect rows from being taken by more than one consumer I'm locking it by adding some flag into appropriate column...
So, to "lock" record I do
SELECT TOP 1 .....
and then UPDATE operation on record with some specific ID.
This operation takes up to 5 seconds now (I tried in SQL Server Management Studio):
SELECT TOP 1 *
FROM testdb.dbo.myTable
WHERE recordLockedBy is NULL;
How can I speed it up?
Here is the table structure:
CREATE TABLE [dbo].[myTable](
[id] [int] IDENTITY(1,1) NOT NULL,
[num] [varchar](15) NOT NULL,
[date] [datetime] NULL,
[field1] [varchar](150) NULL,
[field2] [varchar](150) NULL,
[field3] [varchar](150) NULL,
[field4] [varchar](150) NULL,
[date2] [datetime] NULL,
[recordLockedBy] [varchar](100) NULL,
[timeLocked] [datetime] NULL,
[field5] [varchar](100) NULL);
Indexes should be placed on any columns you use in your query's where clause. Therefore you should add an index to recordLockedBy.
If you don't know about indexes look here
Quicker starter for you:
ALTER TABLE myTable
ADD INDEX IDX_myTable_recordLockedBy (recordLockedBy)
Does your select statement query by id as well? If so, this should be set as a primary key with a clustered index (the default for PKs I believe). SQL will then be able to jump directly to the record - should be near instant. Without it will do a table scan looking at every record in the sequence they appear on disk until it finds the one you're after.
This won't prevent a race condition on the table and allow the same row to be processed by multiple consumers.
Look at UPDLOCK and READPAST locking hints to handle this case:
http://www.mssqltips.com/sqlservertip/1257/processing-data-queues-in-sql-server-with-readpast-and-updlock/
If the table is used for job scheduling and processing, perhaps you can use MSMQ to solve this problem. You don't need to worry about locking and things like that. It also scales much better on enterprise, and has many different send/received modes.
You can learn more about it here:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms711472(v=vs.85).aspx