I need your help :)
I have a table in a database (SQL Server 2008 R2). Currently there are around 4M rows.
Consumer apps take rows from there (lock them and process).
To protect rows from being taken by more than one consumer I'm locking it by adding some flag into appropriate column...
So, to "lock" record I do
SELECT TOP 1 .....
and then UPDATE operation on record with some specific ID.
This operation takes up to 5 seconds now (I tried in SQL Server Management Studio):
SELECT TOP 1 *
FROM testdb.dbo.myTable
WHERE recordLockedBy is NULL;
How can I speed it up?
Here is the table structure:
CREATE TABLE [dbo].[myTable](
[id] [int] IDENTITY(1,1) NOT NULL,
[num] [varchar](15) NOT NULL,
[date] [datetime] NULL,
[field1] [varchar](150) NULL,
[field2] [varchar](150) NULL,
[field3] [varchar](150) NULL,
[field4] [varchar](150) NULL,
[date2] [datetime] NULL,
[recordLockedBy] [varchar](100) NULL,
[timeLocked] [datetime] NULL,
[field5] [varchar](100) NULL);
Indexes should be placed on any columns you use in your query's where clause. Therefore you should add an index to recordLockedBy.
If you don't know about indexes look here
Quicker starter for you:
ALTER TABLE myTable
ADD INDEX IDX_myTable_recordLockedBy (recordLockedBy)
Does your select statement query by id as well? If so, this should be set as a primary key with a clustered index (the default for PKs I believe). SQL will then be able to jump directly to the record - should be near instant. Without it will do a table scan looking at every record in the sequence they appear on disk until it finds the one you're after.
This won't prevent a race condition on the table and allow the same row to be processed by multiple consumers.
Look at UPDLOCK and READPAST locking hints to handle this case:
http://www.mssqltips.com/sqlservertip/1257/processing-data-queues-in-sql-server-with-readpast-and-updlock/
If the table is used for job scheduling and processing, perhaps you can use MSMQ to solve this problem. You don't need to worry about locking and things like that. It also scales much better on enterprise, and has many different send/received modes.
You can learn more about it here:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms711472(v=vs.85).aspx
Related
I am developing an API in .Net Core where I have to save an integer number in the SQL table. The number must be of length '9' any number starting from 000000001 - 9 digit number and it should never repeat in the future also I want this number at the memory level because I have to use this number for some other purposes. After doing some search, one of the most common solution is using DateTime.Now.Ticks and trimming its length but the problem is when concurrent HTTP requests come the ticks value might be the same.
One solution to solve this is by applying a lock on the method and releasing it when data will saved in the database but this will slow the performance of the application as the lock is expensive to use.
Second solution is by introducing a new table and setting the initial counter value to 1, so on every HTTP request first apply the UnitOfWork read the value from the table and increment it by one and then save it and then process the other request but again there is a performance hit and not an optimum solution.
So, Is there any other solution that is faster and less expensive?
Thanks in advance
I think you can create a computed column/field in combination with ID(Auto increment). Auto ID will help you insert unique number and computed field will help you make that field Generate specific in length.
An example:
CREATE TABLE [dbo].[EmployeeMaster](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PreFix] [varchar](50) NOT NULL,
[EmployeeNo] AS ([PreFix]+ RIGHT('0000000' + CAST(Id AS VARCHAR(9)), 9)) PERSISTED,
[EmployeeName] VARCHAR(50),
CONSTRAINT [PK_AutoInc] PRIMARY KEY ([ID] ASC)
)
This is the issue:
I have a table in which update queries are failing.
When a user clicks on a delete button in the application, RowStatus should be set as 0 instead of 1, the data type is bit.
Using SQL profiler, we can see that the update query is reaching the SQL server, but it is not running there - it does not return any exception to the application.
We are using the PK of the table to identify the row to be updated.
We are able to successfully insert values into table from application via web server, only the update queries fail.
We have muplitple tables in the application - but the issue is only for this table.
We use entity framework for updating the table.
Can any one please help ?
This is the table structure:
[dbo].[TableName](
[PrimaryKey] [int] IDENTITY(1,1) NOT NULL,
[ForeignKey1] [int] NOT NULL,
[ForeignKey2] [int] NOT NULL,
[RowStatus] [bit] NULL,
[CreatedBy] [int] NULL,
[CreationDate] [datetime2](7) NULL,
[UpdatedBy] [int] NOT NULL,
[UpdatedDate] [datetime2](7) NOT NULL
)
This is the query i saw in profiler:
exec sp_executesql N'UPDATE [dbo].[TableName]
SET [RowStatus] = #0, [UpdatedBy] = #1, [UpdatedDate] = #2
WHERE [PrimaryKey] = #3)
',N'#0 bit,#1 int,#2 datetime2(7),#3 int',#0=0,#1=999,#2='2018-05-02
05:20:16.2795067',#3=30
Update: It started working after i changed the dbcontext.savechanges() instead of dbcontext.savechangesasync() to save the changes to entity.
Has anyone faced this issue before ?
If you are using Entity Framework try cleaning the project as some cache may stop your code to execute cleanly.
I forgot to add 'await' keyword in the method call.
That was causing this issue.
So I have this table:
CREATE TABLE [Snapshots].[Crashproof](
[EmoteCountId] [int] IDENTITY(1,1) NOT NULL,
[SnapshotId] [int] NOT NULL,
[Emote] [nvarchar](42) NOT NULL,
[EmoteCountTypeId] [int] NOT NULL,
[Count] [decimal](19, 6) NOT NULL,
CONSTRAINT [PK_SnapshotsCrashproof] PRIMARY KEY CLUSTERED ([EmoteCountId] ASC) ON [PRIMARY],
CONSTRAINT [FK_SnapshotsCrashproof_Snapshots] FOREIGN KEY ([SnapshotId]) REFERENCES [Snapshots].[Snapshots] ([SnapshotId]) ON DELETE CASCADE,
CONSTRAINT [FK_SnapshotsCrashproof_EmoteCountTypes] FOREIGN KEY ([EmoteCountTypeId]) REFERENCES [dbo].[EmoteCountTypes] ([EmoteCountTypeId])
) ON [PRIMARY]
GO
and this code that inserts into it:
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default, trans))
{
bulkCopy.DestinationTableName = "Snapshots.Crashproof";
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("SnapshotId", "SnapshotId"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("EmoteCountTypeId", "EmoteCountTypeId"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Emote", "Emote"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Count", "Count"));
using (IDataReader reader = ObjectReader.Create(emoteCountTypesToSnapshot))
{
bulkCopy.WriteToServer(reader);
}
}
it runs fine 99.99% of the time (and the bulk copy is done every minute), however I did have an exception once, it was on the last line (bulkCopy.WriteToServer(reader);):
Violation of PRIMARY KEY constraint...Cannot insert duplicate key in object 'Snapshots.Crashproof'. The duplicate key value is (247125).
I get it that bulk inserting directly in the final table is not recommended, I will modify my code to bulk insert into a staging table, then insert from there. But is it what caused this exception?
I really don't understand how a duplicate key can occur on an Identity field :|
My suggestion is to use an ETL product to do these kinds of tasks. They organize everything for you and just do such a cleaner job. Messing around with bulk copys and stored procedures and making staging tables (which i did for years) is just so ugly to me now.
I've been using Pentaho Kettle for the last few years and have totally fallen in love with it. This task would take me mere minutes, or maybe under a minute, to implement, and does all the heavy lifting of sanely moving/transforming data from place to place in big heaps.
I know it's not a direct answer to your question but it's the immediate thing I would suggest if someone came to me at work with this question.
This could probably be a race condition on the bulk copy.
If I understand correctly, you are running a bulk copy every minute, and yes, most of the time it will work correctly but depending on the batch size you are trying to insert it will take longer than a minute.
So, assuming that. It can happen to collapse one bulk copy to another trying to insert at the same time records with the same key leading to a Violation of PRIMARY KEY constraint.
I've created an application that has creeped into production, it has several tables like to one below.
I have a search query similar to below for each table. The database is growing by several thousand rows per day and I'm concerned about performance moving forward.
Can anyone suggest how I should re-engineer this process to increase efficiency?
I'm using Entity framework, C# and SQL Server.
Also is it possible to estimate system resource requirements for a database like this? Let's say for example if I had 600 000 rows?
Thanks in advance for the replies!
select top 100 *
from table
where given_name.contains(search)
or family_name.contains(search)
or session_number.contains(search)
Table structure:
[id] [int] IDENTITY(1,1) NOT NULL,
[given_name] [nvarchar](100) NULL,
[family_name] [nvarchar](100) NULL,
[session_number] [nvarchar](100) NULL,
[birth_date] [datetime2](7) NULL,
[start_date] [datetime2](7) NULL,
[reported_date] [datetime2](7) NULL,
[confirmed_date] [datetime2](7) NULL,
[dir_name] [nvarchar](100) NULL,
[info] [text] NULL,
[complete] [bit] NULL,
[approved_by] [uniqueidentifier] NULL,
[reported_by] [uniqueidentifier] NULL,
[code] [nvarchar](10) NULL,
[sex] [bit] NULL,
[emergency] [bit] NULL,
[release] [bit] NULL,
[stop] [bit] NULL,
600.000 rows is not so much rows so actually you can go on with your approach.
If it will increase there is 1 problem and one potential problem:
The query has Contains clause that EF translates into SQL like similar to this pattern Like '%%'. The optimizer will not use the index on given_name, family_name and session number. You can evaluate SQL Server full text search that is not directly supported by Ef but there are some libraries (few lines of code) to enable a support. You can find one of them here
http://www.entityframework.info/Home/FullTextSearch
The second problem is related to OR optimization if there are the right indexes (is not your case!!!). The DBMS in this case can work in 2 ways:
Table scan evaluating the WHERE CLAUSE;
Retrieve 3 different temporary tables (1 for every OR condition) than make a UNION (not a UNION ALL that is less expensive but the result is different). A UNION is a merge of tables so the DBMS could sort temporary tables if there are several records or just work on the 3 temporary tables with table scans (sort on REC_ID not on the index used to solve the OR clause)
So quite expensive approach in both cases but if you need really an OR clause is the best approach. Also, the DBMS works on index statistics so it will probably make a better choice than a programmer choice (we hope so).
If you don't mind to have duplicated records you can split the query in 3 different queries and make a sql UNION (Concat in LINQ). Take care becouse a single record that is ok for more than one OR condition will appear more times.
I think you can create a stored procedure to handle your search. Also, to avoid OR you could use Full-Text. Then use in the stored procedure using Full-Text Search like this:
CREATE PROCEDURE prc_SearchTable
#searchTerm VARCHAR(100)
-- searchTerm should be like *john*
AS
BEGIN
SELECT *
FROM theTable
WHERE CONTAINS((given_name,family_name,family_name), #search)
END
Make sure you add the wild cards for Full-Text Search * term * (without spaces).
You can add the stored procedure to EF like described here
I am building an C# application that inserts 2000 records every second using Bulkinsert.
Database version is 2008 R2
The application calls a SP that deletes the records when they are more than 2 hours old in chunks using TOP (10000). This is performed after each insert.
The enduser selects records to view in a diagram using dateranges and a selection of 2 to 10 parameterids.
Since the application will run 24/7 with no downtime i am concerned about performance issues.
Partitioning is not an option since the customer dont have an Enterprise edition.
Is the clustered index definition good?
Is it neccesary to implement any index recreation / reindexation to increase performance due to the fact that rows are inserted in one end of the table and removed in the other end?
What about update statistics, is it still an issue in 2008 R2?
I use OPTION (RECOMPILE) to avoid using outdated queryplans in the select, is that a good approach?
Are there any tablehints that can speed up the SELECT?
Any suggestions around locking strategies?
In addition to the scenario above i have 3 more tables that works in the same way with different timeframes. One inserts every 20 seconds and deletes rows older than 1 week, another inserts every minute and deletes rows older than six weeks and the last inserts every 5 minutes and deletes rows older than 3 years.
CREATE TABLE [dbo].[BufferShort](
[DateTime] [datetime2](2) NOT NULL,
[ParameterId] [int] NOT NULL,
[BufferStateId] [smallint] NOT NULL,
[Value] [real] NOT NULL,
CONSTRAINT [PK_BufferShort] PRIMARY KEY CLUSTERED
(
[DateTime] ASC,
[ParameterId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER PROCEDURE [dbo].[DeleteFromBufferShort]
#DateTime DateTime,
#BufferSizeInHours int
AS
BEGIN
DELETE TOP (10000)
FROM BufferShort
FROM BufferStates
WHERE BufferShort.BufferStateId = BufferStates.BufferStateId
AND BufferShort.[DateTime] < #DateTime
AND (BufferStates.BufferStateType = 'A' OR BufferStates.Deleted = 'True')
RETURN 0
END
ALTER PROCEDURE [dbo].[SelectFromBufferShortWithParameterList]
#DateTimeFrom Datetime2(2),
#DateTimeTo Datetime2(2),
#ParameterList varchar(max)
AS
BEGIN
SET NOCOUNT ON;
-- Split ParameterList into a temporary table
SELECT * INTO #TempTable FROM dbo.splitString(#ParameterList, ',');
SELECT *
FROM BufferShort Datapoints
JOIN Parameters P ON P.ParameterId = Datapoints.ParameterId
JOIN #TempTable TT ON TT.Token = P.ElementReference
WHERE Datapoints.[DateTime] BETWEEN #DateTimeFrom AND #DateTimeTo
ORDER BY [DateTime]
OPTION (RECOMPILE)
RETURN 0
END
This is a classic case of penny wise/pound foolish. You are inserting 150 million records per day and you are not using Enterprise.
The main reason not to use a clustered index is because the machine cannot keep up the quantity of rows being inserted. Otherwise you should always use a clustered index. The decision of whether to use a clustered index is usually argued between those who believe that every table should have a clustered index and those who believe that perhaps one or two percent of tables should not have a clustered index. (I don't have time to engage in a 'religious' type debate about this- just research the web.) I always go with a clustered index unless the inserts on a table are failing.
I would not use the STATISTICS_NORECOMPUTE clause. I would only turn it off if inserts are failing. Please see Kimberly Tripp's (an MVP and a real SQL Server expert) article at http://sqlmag.com/blog/statisticsnorecompute-when-would-anyone-want-use-it.
I would also not use OPTION (RECOMPILE) unless you see queries are not using the right indexes (or join types) in the actual query plan. If your query is executed many times per minute/second this can have an unnecessary impact on the performance of your machine.
The clustered index definition seems good as long as all queries specify at least the leading DateTime column. The index will also maximize insert speed, assuming the times are incremental, as well as reduce fragmentation. You shouldn't need to reorg/reorganize often.
If you have only the clustered index on this table, I wouldn't expect you need to update stats frequently because there isn't another data access path. If you have other indexes and complex queries, verify the index is branded ascending with the query below. You may need to update stats frequently if it is not branded ascending and you have complex queries:
DBCC TRACEON(2388);
DBCC SHOW_STATISTICS('dbo.BufferShort', 'PK_BufferShort');
DBCC TRACEOFF(2388);
For the #ParameterList, consider a table-valued-parameter instead. Specify a primary key of Token on the table type.
I would suggest you introduce the RECOMPILE hint only if needed; I suspect you will get a stable plan with a clustered index seek without it.
If you have blocking problems, consider altering the database to specify the READ_COMMITTED_SNAPSHOT option so that row versioning instead of blocking is used for read consistency. Note that this will add 14 bytes of row overhead and use tempdb more heavily, but the concurrency benefits might outweigh the costs.