I am building an C# application that inserts 2000 records every second using Bulkinsert.
Database version is 2008 R2
The application calls a SP that deletes the records when they are more than 2 hours old in chunks using TOP (10000). This is performed after each insert.
The enduser selects records to view in a diagram using dateranges and a selection of 2 to 10 parameterids.
Since the application will run 24/7 with no downtime i am concerned about performance issues.
Partitioning is not an option since the customer dont have an Enterprise edition.
Is the clustered index definition good?
Is it neccesary to implement any index recreation / reindexation to increase performance due to the fact that rows are inserted in one end of the table and removed in the other end?
What about update statistics, is it still an issue in 2008 R2?
I use OPTION (RECOMPILE) to avoid using outdated queryplans in the select, is that a good approach?
Are there any tablehints that can speed up the SELECT?
Any suggestions around locking strategies?
In addition to the scenario above i have 3 more tables that works in the same way with different timeframes. One inserts every 20 seconds and deletes rows older than 1 week, another inserts every minute and deletes rows older than six weeks and the last inserts every 5 minutes and deletes rows older than 3 years.
CREATE TABLE [dbo].[BufferShort](
[DateTime] [datetime2](2) NOT NULL,
[ParameterId] [int] NOT NULL,
[BufferStateId] [smallint] NOT NULL,
[Value] [real] NOT NULL,
CONSTRAINT [PK_BufferShort] PRIMARY KEY CLUSTERED
(
[DateTime] ASC,
[ParameterId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER PROCEDURE [dbo].[DeleteFromBufferShort]
#DateTime DateTime,
#BufferSizeInHours int
AS
BEGIN
DELETE TOP (10000)
FROM BufferShort
FROM BufferStates
WHERE BufferShort.BufferStateId = BufferStates.BufferStateId
AND BufferShort.[DateTime] < #DateTime
AND (BufferStates.BufferStateType = 'A' OR BufferStates.Deleted = 'True')
RETURN 0
END
ALTER PROCEDURE [dbo].[SelectFromBufferShortWithParameterList]
#DateTimeFrom Datetime2(2),
#DateTimeTo Datetime2(2),
#ParameterList varchar(max)
AS
BEGIN
SET NOCOUNT ON;
-- Split ParameterList into a temporary table
SELECT * INTO #TempTable FROM dbo.splitString(#ParameterList, ',');
SELECT *
FROM BufferShort Datapoints
JOIN Parameters P ON P.ParameterId = Datapoints.ParameterId
JOIN #TempTable TT ON TT.Token = P.ElementReference
WHERE Datapoints.[DateTime] BETWEEN #DateTimeFrom AND #DateTimeTo
ORDER BY [DateTime]
OPTION (RECOMPILE)
RETURN 0
END
This is a classic case of penny wise/pound foolish. You are inserting 150 million records per day and you are not using Enterprise.
The main reason not to use a clustered index is because the machine cannot keep up the quantity of rows being inserted. Otherwise you should always use a clustered index. The decision of whether to use a clustered index is usually argued between those who believe that every table should have a clustered index and those who believe that perhaps one or two percent of tables should not have a clustered index. (I don't have time to engage in a 'religious' type debate about this- just research the web.) I always go with a clustered index unless the inserts on a table are failing.
I would not use the STATISTICS_NORECOMPUTE clause. I would only turn it off if inserts are failing. Please see Kimberly Tripp's (an MVP and a real SQL Server expert) article at http://sqlmag.com/blog/statisticsnorecompute-when-would-anyone-want-use-it.
I would also not use OPTION (RECOMPILE) unless you see queries are not using the right indexes (or join types) in the actual query plan. If your query is executed many times per minute/second this can have an unnecessary impact on the performance of your machine.
The clustered index definition seems good as long as all queries specify at least the leading DateTime column. The index will also maximize insert speed, assuming the times are incremental, as well as reduce fragmentation. You shouldn't need to reorg/reorganize often.
If you have only the clustered index on this table, I wouldn't expect you need to update stats frequently because there isn't another data access path. If you have other indexes and complex queries, verify the index is branded ascending with the query below. You may need to update stats frequently if it is not branded ascending and you have complex queries:
DBCC TRACEON(2388);
DBCC SHOW_STATISTICS('dbo.BufferShort', 'PK_BufferShort');
DBCC TRACEOFF(2388);
For the #ParameterList, consider a table-valued-parameter instead. Specify a primary key of Token on the table type.
I would suggest you introduce the RECOMPILE hint only if needed; I suspect you will get a stable plan with a clustered index seek without it.
If you have blocking problems, consider altering the database to specify the READ_COMMITTED_SNAPSHOT option so that row versioning instead of blocking is used for read consistency. Note that this will add 14 bytes of row overhead and use tempdb more heavily, but the concurrency benefits might outweigh the costs.
Related
I am developing an API in .Net Core where I have to save an integer number in the SQL table. The number must be of length '9' any number starting from 000000001 - 9 digit number and it should never repeat in the future also I want this number at the memory level because I have to use this number for some other purposes. After doing some search, one of the most common solution is using DateTime.Now.Ticks and trimming its length but the problem is when concurrent HTTP requests come the ticks value might be the same.
One solution to solve this is by applying a lock on the method and releasing it when data will saved in the database but this will slow the performance of the application as the lock is expensive to use.
Second solution is by introducing a new table and setting the initial counter value to 1, so on every HTTP request first apply the UnitOfWork read the value from the table and increment it by one and then save it and then process the other request but again there is a performance hit and not an optimum solution.
So, Is there any other solution that is faster and less expensive?
Thanks in advance
I think you can create a computed column/field in combination with ID(Auto increment). Auto ID will help you insert unique number and computed field will help you make that field Generate specific in length.
An example:
CREATE TABLE [dbo].[EmployeeMaster](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PreFix] [varchar](50) NOT NULL,
[EmployeeNo] AS ([PreFix]+ RIGHT('0000000' + CAST(Id AS VARCHAR(9)), 9)) PERSISTED,
[EmployeeName] VARCHAR(50),
CONSTRAINT [PK_AutoInc] PRIMARY KEY ([ID] ASC)
)
Just suppose I have a table with a timestamp column that used via NHibernate to manage record version, Now I'm looking for a way to update a record of my data without increase the value of ts column, Because as you know that value would increase after each update statement to track version of data for avoid concurrency issues.
CREATE TABLE [dbo].[TSTest](
[ID] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TSTest] PRIMARY KEY CLUSTERED ( [ID] ASC )
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Any idea?
If there are certain columns that should not be subject to timestamp tracking then you can move those columns into a new table that then refers back to this original table.
If desired, you can hide the existence of this new table and the modified original table from the application by then producing a view called TSTest that joins the tables together (together with triggers that apply inserts, updates and deletes to the appropriate base tables).
However, in this instance it's not clear what we should do since there's only one obvious "updatable" column - Name - and so if we don't want it subject to timestamp tracking, it's unclear why we have it on this table at all.
Unfortunately, there are no other T-SQL mechanisms to avoid timestamps behaviour - and this is usually seen as a good thing. You can't do anything via triggers since if you actually touch the base table, the timestamp will get changed, and you're not allowed to UPDATE timestamp columns so you can't even reset it after change.
I think it would be possible, Nothing is impossible in software
Leaving aside the existence of things like the halting problem, you are of course correct that this problem can be solved - but not in a way that's likely to be useful to you. As I've said above, it's not possible through T-SQL.
If you really need to do this then you can do it by directly manipulating the database file. Of course, that requires you to detach the database or take the server down, and to then pore through the file structures to manually locate the page(s) the contain the rows you wish to alter, then to apply those changes and then to correct other parts of the structure (such as page checksums) so that SQL Server doesn't believe that the pages are now corrupt.
I'm not really advocating this approach, just outlining how far away from normality you'd have to go to actually perform what you're asking for.
Any idea?
Per your comment and schema your ts column [ts] [timestamp] NOT NULL. So it would get modified on every update operation.
One way could be using a AFTER UPDATE trigger and undoing the modification happened. But why would you do that? Moreover, trigger on the same table (or) recursive trigger is not supported in MySQL.
My C# application retrieves over a million records from Sql Server, processes them and then updates the database back. This results in close to 100,000 update statements and they all have the following form -
update Table1 set Col1 = <some number> where Id in (n1, n2, n3....upto n200)
"Id" is an int, primary key with clustered index. No two update statements update the same Ids, so in theory, they can all run in parallel without any locks. Therefore, ideally, I suppose I should run as many as possible in parallel. The expectation is that all finish in no more than 5 minutes.
Now, my question is what is the most efficient way of doing it? I'm trying the below -
Running them sequentially one by one - This is the least efficient solution. Takes over an hour.
Running them in parallel by launching each update in it's on thread - Again very inefficient because we're creating thousands of threads but I tried anyway and it took over an hour and quite a few of them failed because of this or that connection issue.
Bulk inserting in a new table and then do a join for the update. But then we run into concurrency issues because more than one user are expected to be doing it.
Merge batches instead of updates - Google says that merge is actually slower than individual update statements so I haven't tried it.
I suppose this must be a very common problem with many applications out there that handle a sizeable amounts of data. Are there any standard solutions? Any ideas or suggestions will be appreciated.
I created a integer tbl type so that I can pass all my id's to sp as a list and then single query will update whole table.
This is still slow but i see this is way more quicker than conventional "where id in (1,2,3)"
definition for TYPE
CREATE TYPE [dbo].[integer_list_tbltype] AS TABLE(
[n] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[n] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)
GO
Here is the usage.
declare #intval integer_list_tbltype
declare #colval int=10
update c
set c.Col1=#colval
from #intval i
join Table1 c on c.ID = i.n
Let me know if you have any questions.
I am using LINQ along with Entity-Framework to insert some data in a SQL Server 2012 Database.
My Database Table in which the data is being inserted has a primary key and i am inserting about 1000 Records at once.
That is i retrieve data in a set of 1000 rows and save those 1000 rows at one time for performance reasons.
Now the problem is i may occasionally get a duplicate value for any of the row in those 1000 rows and when that happens none of the rows are saved in the database.
Is there any way i can just silently ignore that one row and not insert it while all the other non duplicate rows get inserted?
Also i did try querying the database before every insert but the performance cost for that is too high.
Is there any way i can just silently ignore that one row and not insert it while all the other non duplicate rows get inserted?
If you can recreate the index in SQL Server you can ignore duplicates. After the insert recreate the index without ignore_dups because it's faster without.
CREATE TABLE [dbo].[YourTable](
[id] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
I would suggest that you do a bit of extra processing before the insert statement, without any code to go off of I will try and show you with pseudo code.
ICollection<record> myRecords = Service.GetMyRecords();
//your processing logic before the insert
ICollection<record> recordsToInsert =Service.BusinessLogic(myRecords);
foreach(var record in recordsToInsert)
{
if(myRecords.Contains(record )
{
recordsToInsert.Remove(record);
}
}
This should ensure you have no records in the recordsToInsert Collection that will trip your DB. This also saves you an attempted insert statement since it doesn't try and fail.
Do one query first that checks whether any of the new records exist like so:
var checks = records.Select(r => r.Id).ToArray();
if (!context.Records.Any(r => check.Contains(r.Id))
{
// do the insert
}
After the first check you could refine the check to find out which of the 1000 records is the culprit. So the happy scenario will always be pretty quick. Only when a duplicate is found the process will be slower.
You can't tell EF to silently ignore one database exception while running one transaction.
I'm wondering if its possible to "pause" a clustered index whenever bulk data is being written?
The reason is that:
Bulk inserts are slow (10,000 rows/second) if I have a clustered index on "DateTime".
Bulk inserts are fast (180,000 rows/second) if I have an inactive clustered index on "DateTime".
I don't mind if the clustered index is rebuilt overnight, e.g. from 1am to 6am.
You can't disable a clustered index and still use the table.
Since the clustered index IS THE TABLE having it disabled means you can't access any of the data.
From MSDN:
The data rows of the disabled clustered index cannot be accessed except to drop or rebuild the clustered index.
You can...
disable any nonclustered indexes and rebuild them overnight. This will help greatly
DROP all indexes (including clustered) and insert, then CREATE them overnight. This will render the table basically unusable, though.
My preferred solution for this is a little more complicated:
INSERT into a staging table that has the same clustered index key as your target table
INSERT from staging into target overnight and update indexes as needed then