Handling large amounts of data in SQL

Handling large amounts of data in SQL - c#

I've just taken over a project at work, and my boss has asked me to make it run faster. Great.
So I've identified one of the major bottlenecks to be searching through one particular table from our SQL server, which can take up to a minute, sometimes longer, for a select query with some filters on it to run. Below is the SQL generated by C# Entity Framework (minus all the GO statements):
CREATE TABLE [dbo].[MachineryReading](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Location] [geometry] NULL,
[Latitude] [float] NOT NULL,
[Longitude] [float] NOT NULL,
[Altitude] [float] NULL,
[Odometer] [int] NULL,
[Speed] [float] NULL,
[BatteryLevel] [int] NULL,
[PinFlags] [bigint] NOT NULL, -- Deprecated field, this is now stored in a separate table
[DateRecorded] [datetime] NOT NULL,
[DateReceived] [datetime] NOT NULL,
[Satellites] [int] NOT NULL,
[HDOP] [float] NOT NULL,
[MachineryId] [int] NOT NULL,
[TrackerId] [int] NOT NULL,
[ReportType] [nvarchar](1) NULL,
[FixStatus] [int] NOT NULL,
[AlarmStatus] [int] NOT NULL,
[OperationalSeconds] [int] NOT NULL,
CONSTRAINT [PK_dbo.MachineryReading] PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
ALTER TABLE [dbo].[MachineryReading] ADD DEFAULT ((0)) FOR [FixStatus]
ALTER TABLE [dbo].[MachineryReading] ADD DEFAULT ((0)) FOR [AlarmStatus]
ALTER TABLE [dbo].[MachineryReading] ADD DEFAULT ((0)) FOR [OperationalSeconds]
ALTER TABLE [dbo].[MachineryReading] WITH CHECK ADD CONSTRAINT [FK_dbo.MachineryReading_dbo.Machinery_MachineryId] FOREIGN KEY([MachineryId])
REFERENCES [dbo].[Machinery] ([Id])
ON DELETE CASCADE
ALTER TABLE [dbo].[MachineryReading] CHECK CONSTRAINT [FK_dbo.MachineryReading_dbo.Machinery_MachineryId]
ALTER TABLE [dbo].[MachineryReading] WITH CHECK ADD CONSTRAINT [FK_dbo.MachineryReading_dbo.Tracker_TrackerId] FOREIGN KEY([TrackerId])
REFERENCES [dbo].[Tracker] ([Id])
ON DELETE CASCADE
ALTER TABLE [dbo].[MachineryReading] CHECK CONSTRAINT [FK_dbo.MachineryReading_dbo.Tracker_TrackerId]
The table has indexes on MachineryId, TrackerId, and DateRecorded:
CREATE NONCLUSTERED INDEX [IX_MachineryId] ON [dbo].[MachineryReading]
(
[MachineryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [IX_MachineryId_DateRecorded] ON [dbo].[MachineryReading]
(
[MachineryId] ASC,
[DateRecorded] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [IX_TrackerId] ON [dbo].[MachineryReading]
(
[TrackerId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
When we select from this table, we are almost always interested in one machinery or tracker, over a given date range:
SELECT *
FROM MachineryReading
WHERE MachineryId = 2127 AND
DateRecorded > '2016-12-08 00:00:10.009' AND DateRecorded < '2016-12-11 18:32:41.734'
As you can see, it's quite a basic setup. The main problem is the sheer amount of data we put into it - about one row every ten seconds per tracker, and we have over a hundred trackers at the moment. We're currently sitting somewhere around 10-15 million rows. So this leaves me with two questions.
Am I thrashing the database if I insert 10 rows per second (without batching them)?
Given that this is historical data, so once it is inserted it will never change, is there anything I can do to speed up read access?

You have too many non-clustered indexes on the table - which will increase the size of the DB.
If you have an index on MachineryId and DateRecorded - you don't really need a separate one on MachineryId.
With 3 of your Non-Clustered indexes - there are 3 more copies of the data
Clustered VS Non-Clustered
No Include on the Non-Clustered index
When SQL Server is executing your SQL it is first searching the Non-Clustered Index for the required data, then it is going back to the original table (bookmark lookup) Link and getting the rest of the columns as you are doing select *, but the non-clustered index doesn't have all the columns (That is what I think is happening - Can't really tell without the Query Plan)
Include columns in non-clustered index: https://stackoverflow.com/a/1308325/1910735
You should maintain you indexes - by creating a maintenance plan to check for fragmentation and rebuild or reorganize your indexes on weekly basis.
I really think you should have a Clustered index on your MachineryId and DateRecordred instead of a Non-Clustered index. A table can only have one Clustered Index ( this is the order data is stored on the Hard Disk) - as Most of your queries will be in DateRecordred and MachineryId order - it will be better to store them that way,
Also if you really are searching by TrackerId in any query, try adding it to the same Clustered Index
IMPORTANT NOTE: DELETE THE NON-CLUSTERED INDEX in TEST environment before going LIVE
Create a clustered index instead of your non-clustered index, run different queries - Check the performance by comparing the Query Plans and the STATISTICS IO)
Some resources for Index and SQL Query help:
Subscribe to the newsletter here and download the first responder kit:
https://www.brentozar.com/?s=first+responder
It is now open source - but I don't know if it has the actual PDF getting started and help files (Subscribe in the above link anyway - for weekly articles/tutorials)
https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit

Tuning is per query, but in any case -
I see you have no partitions and no indexes, which means, no matter what you do. it always results in a full table scan.
For your specific query -
create index MachineryReading_ix_MachineryReading_DateRecorded
on (MachineryReading,DateRecorded)

First, 10 inserts per second is very feasible under almost any reasonable circumstances.
Second, you need an index. For this query:
SELECT *
FROM MachineryReading
WHERE MachineryId = 2127 AND
DateRecorded > '2016-12-08 00:00:10.009' AND DateRecorded < '2016-12-11 18:32:41.734';
You need an index on MachineryReading(MachineryId, DateRecorded). That will probably solve your performance problem.
If you have similar queries for tracker, then you want an index on MachineryReading(TrackerId, DateRecorded).
These will slightly impede the progress of in the inserts. But the overall improvement should be so great, that all will be a big win.

Related

Computed primary key doesn't get updated in EF Core entity after insert

I have a special case where the Id of the table is defined as a computed column like this:
CREATE TABLE [BusinessArea](
[Id] AS (isnull((CONVERT([nvarchar],[CasaId],(0))+'-')+CONVERT([nvarchar],[ConfigurationId],(0)),'-')) PERSISTED NOT NULL,
[CasaId] [int] NOT NULL,
[ConfigurationId] [int] NOT NULL,
[Code] [nvarchar](4) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_BusinessArea] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
Usually when I have a computed column i configure it like this:
builder.Entity<MyEntity>()
.Property(p => p.MyComputed).HasComputedColumnSql(null);
With .HasComputedColumnSql() the value of MyComputed is reflected an insert/update on the entity.
However this trick doesn't work if the computed column is a PK.
Any idea on how to make that work also with a PK?

It can be made to work, but only for insert, by setting property BeforeSaveBehavior to Ignore:
modelBuilder.Entity<BusinessArea>().Property(e => e.Id)
.Metadata.BeforeSaveBehavior = PropertySaveBehavior.Ignore;
But in general such design will cause problems with EF Core because it doesn't support mutable keys (primary or alternate). Which means that it would never retrieve back the Id from database after update. You can verify that by marking the property as ValueGeneratedOnAddOrUpdate (which is the normal behavior for computed columns):
modelBuilder.Entity<BusinessArea>().Property(e => e.Id)
.ValueGeneratedOnAddOrUpdate();
If you do so, EF Core will throw InvalidOperationException saying
The property 'Id' cannot be configured as 'ValueGeneratedOnUpdate' or 'ValueGeneratedOnAddOrUpdate' because the key value cannot be changed after the entity has been added to the store.

Updating column in MSSQL quickly

I am using an Microsoft SQL Web server on Amazon RDS. The system is currently generating timeouts when updating one column, I am trying to resolve the issue or at least minimize it. Currently the updates occur when a device calls in and they call in a lot, to the point where a device may call back before the webserver finished the last call.
Microsoft SQL Server Web (64-bit)
Version 13.0.4422.0
I see a couple potential possibilities here. First is the device is calling back before the system finished handling the last call so the same record is being updated multiple times concurrently. The second possibility is that I am running into a row lock or table lock.
The table has about 3,000 records in total.
Note I am only trying to update one column in one row at a time. The other columns are never updated.
I don't need to have the last updated time to be very accurate, would there be any benefit to changing the code to only update the column if say greater than a few minutes or would that just add more load to the server? Any suggestion on how to optimize this? Maybe move it to a function, store procedure, or something else?
Suggested new code:
UPDATE [Devices] SET [LastUpdated] = GETUTCDATE()
WHERE [Id] = #id AND
([LastUpdated] IS NULL OR DATEDIFF(MI, [LastUpdated], GETUTCDATE()) > 2);
Existing update code:
internal static async Task UpdateDeviceTime(ApplicationDbContext db, int deviceId, DateTime dateTime)
{
var parm1 = new System.Data.SqlClient.SqlParameter("#id", deviceId);
var parm2 = new System.Data.SqlClient.SqlParameter("#date", dateTime);
var sql = "UPDATE [Devices] SET [LastUpdated] = #date WHERE [Id] = #Id";
// timeout occurs here.
var cnt = await db.Database.ExecuteSqlCommandAsync(sql, new object[] { parm1, parm2 });
}
Table creation script:
CREATE TABLE [dbo].[Devices](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CompanyId] [int] NOT NULL,
[Button_MAC_Address] [nvarchar](17) NOT NULL,
[Password] [nvarchar](max) NOT NULL,
[TimeOffset] [int] NOT NULL,
[CreationTime] [datetime] NULL,
[LastUpdated] [datetime] NULL,
CONSTRAINT [PK_Devices] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[Devices] ADD CONSTRAINT [DF_Devices_CompanyId] DEFAULT ((1)) FOR [CompanyId]
GO
ALTER TABLE [dbo].[Devices] ADD CONSTRAINT [DF_Devices_TimeOffset] DEFAULT ((-5)) FOR [TimeOffset]
GO
ALTER TABLE [dbo].[Devices] ADD CONSTRAINT [DF_Devices_CreationTime] DEFAULT (getdate()) FOR [CreationTime]
GO
ALTER TABLE [dbo].[Devices] ADD CONSTRAINT [PK_Devices] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

You should look into the cause by using a tool such as profiler or other techniques to detect blocking. I dont see why you would have a problem updating one column your table with only 3,000 records. It might have something to do with your constraints.
If it really is a timing issue, then you can consider in memory OLTP, designed to handle this type of scenario.
Last updated could also be stored in a transaction based table with a link back to this table with a join using Max(UpdatedTime). In this case you would never update just add new records.
You can then either use partitioning or a cleanup routine to keep the size of this transaction table down.
Programming patterns that In-Memory OLTP will improve include
concurrency scenarios, point lookups, workloads where there are many
inserts and updates, and business logic in stored procedures.
https://msdn.microsoft.com/library/dn133186(v=sql.120).aspx

SSIS Parent table Child relation migration

I'm trying to use SSIS to move some data from one SQL server to my Destimation SQL server, the source has a table "Parent" with Identity field ID that is a Foreign key to the "Child" table.
1 - N relation
The question is simple, what is the best way to transfer the data to a different SQL Server with still a parent child relation.
Note: Both ID (Parent and Child) are identity fields that we do not want to migrate since the destination source wont necessary need to have them.
Please share your comments and ideas.
FYI: We create a .Net Code (C#) that does this, we have a query that gets parent data, a query that get childs data and using linq we join the data and we loop parent getting the new ID and inserting as reference of second table. This is working but we want to create the same on SSIS to be able to scale later.

You have to import Parent Table Before Child Table:
First You have to Create Tables On Destination Server, you can achieve this using an query like the following:
CREATE TABLE [dbo].[Tbl_Child](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Parent_ID] [int] NULL,
[Name] [varchar](50) NULL,
CONSTRAINT [PK_Tbl_Child] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Tbl_Parent](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](50) NULL,
CONSTRAINT [PK_Tbl_Parent] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Tbl_Child] WITH CHECK ADD CONSTRAINT [FK_Tbl_Child_Tbl_Parent] FOREIGN KEY([Parent_ID])
REFERENCES [dbo].[Tbl_Parent] ([ID])
GO
ALTER TABLE [dbo].[Tbl_Child] CHECK CONSTRAINT [FK_Tbl_Child_Tbl_Parent]
GO
Add two OLEDB Connection manager (Source & Destination)
Next you have to add a DataFlow Task to Import Parent Table Data From Source. You have to check Keep Identity option
Next you have to add a DataFlow Task to Import Child Table Data From Source. You have to check Keep Identity option
Package May Look like the following
WorkAround: you can disable constraint and import data then enabling it by adding a SQL Task before and after Importing
Disable Constraint:
ALTER TABLE Tbl_Child NOCHECK CONSTRAINT FK_Tbl_Child_Tbl_Parent
Enable Constraint:
ALTER TABLE Tbl_Child CHECK CONSTRAINT FK_Tbl_Child_Tbl_Parent
if using this Workaround it is not necessary to follow an order when importing

Best Approach to audit changes in the table

I'm designing a .NET application where user can copy, edit, update or delete a record from a DataGrid element that is getting data from a table.
In the design I need to be able to maintain the versions to check all the changes that were made to the table by users. Can someone suggest what would be the best way to implement this requirement.
Thanks a lot.

As mentioned by Sean Lange, this is a complex topic and there is no silver bullet.
Change Data Capture is designed explicitly for this problem.
If you don't want to enable CDC, then triggers are usually your best bet.
Here is an example of an auditing trigger utilizing rowversion\timestamp:
CREATE TABLE [dbo].[Table1](
[Id] [int] NOT NULL,
[Data] [nvarchar](50) NULL,
[Version] [timestamp] NOT NULL,
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table1_History](
[Id] [int] NOT NULL,
[Data] [nvarchar](50) NULL,
[Version] [binary](8) NOT NULL,
[ModDate] [datetimeoffset](7) NOT NULL,
[ModUser] [nvarchar](50) NOT NULL,
[Operation] CHAR(1) NOT NULL
) ON [PRIMARY]
GO
CREATE TRIGGER [dbo].[trgTable1_History]
ON [dbo].[Table1]
AFTER INSERT,DELETE,UPDATE
AS
BEGIN
SET NOCOUNT ON;
DECLARE #now DATETIMEOFFSET(7) = SYSDATETIMEOFFSET()
INSERT INTO dbo.Table1_History
(Id, Data, Version, ModDate, ModUser, Operation)
SELECT Id, Data, Version, #now, SYSTEM_USER, 'I' from inserted
INSERT INTO dbo.Table1_History
(Id, Data, Version, ModDate, ModUser, Operation)
SELECT Id, Data, Version, #now, SYSTEM_USER, 'D' from deleted
END
The timestamp column will automatically update upon every change to the row.
This provides you current + history in your audit table(which simplifies reporting). The Version column gives you an easy lookup between Table1 and Table1_History, in the event that you want to know the exact audit details of the current row. Updates are designated by a DELETE(D) and INSERT(I) occurring simultaneously in the audit.

If you are talking about a database table, then it's the best practice to actually never delete data from it. You should set a flag column in the table which denotes the operation performed in the corresponding row.
You can set it to 0 for active rows, 1 if the row is deleted. So, basically, you will perform an Update operation while deleting a row. Same can be followed for other CRUD operations.

Entity Framework Keeps overwriting attributes. Need other way of solving issue

I have an existing app / dataabse. I have been tasked to add in Entity Framework as part of an upgrade.
I hit a problem where when I generate (or regenerate) the edmx, the code no longer recognises the foreign keys in the database tables, and when the code runs, it complains about missing id's, as, I assume, it is 'guessing' what the foreign keys should be.
I can get round this by adding the following attribute to the Auto generated model definitions.
[ForeignKey("NavigationProperty")]
But then, if / when the edmx is regenerated, all this gets blown away, and has to be re-added.
Although the class that is generated is partial, as these attributes are being added to existing members, I cannot move them to a seperate file.
So, how do I get round this option? Ideally I'd like to ensure that when the edmx is generated it picks up the foreign keys, so that this issue is fixed permanently. If that can't be done, next step is to ask if there is some way of programatically generating these associations, so it is only done once.
Thanks
edit - Added in sample table definition
Here is the code auto generated by SMS. Is tehre anything wrong with the foreign key definition?
CREATE TABLE [dbo].[ShopProductTypes](
[id] [int] IDENTITY(1,1) NOT NULL,
[Shop_Id] [int] NOT NULL,
[Product_Id] [int] NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[CancelledDate] [datetime] NULL,
[Archived] [bit] NOT NULL,
CONSTRAINT [PK_ShopProductTypes] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ShopProductTypes] WITH CHECK ADD CONSTRAINT [FK_ShopProductTypes_Shop] FOREIGN KEY([Shop_Id])
REFERENCES [dbo].[Shops] ([Id])
GO

I found this:
http://blogs.msdn.com/b/dsimmons/archive/2007/09/01/ef-codegen-events-for-fun-and-profit-aka-how-to-add-custom-attributes-to-my-generated-classes.aspx
It's a bit more involved.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Handling large amounts of data in SQL - c#

Tuning is per query, but in any case - I see you have no partitions and no indexes, which means, no matter what you do. it always results in a full table scan. For your specific query - create index MachineryReading_ix_MachineryReading_DateRecorded on (MachineryReading,DateRecorded)

Related

Computed primary key doesn't get updated in EF Core entity after insert

Updating column in MSSQL quickly

SSIS Parent table Child relation migration

Best Approach to audit changes in the table

Entity Framework Keeps overwriting attributes. Need other way of solving issue

Categories

Resources