Low-Memory Traversal of Database Table

Low-Memory Traversal of Database Table - c#

I have a database with a large number of records that are Date/Time stamped. I need to traverse through these records (in chronological order) and perform some analysis on them.
The database is too large to pull in every record at once, so I thought of pulling in a few weeks/days/hours/etc at a time. The problem I'm having is that no matter what I've tried, the database (SQL Server) just uses all the memory on my machine. Even after the application is closed, sqlservr.exe is still using all of my memory. It typically uses about 1.8 GB of memory, no matter if my "batches" only contain 10 records or 1,000,000.
The question is: How can I query the database to get "batches" of records at a time, without the database consuming every bit of memory?
I am using the System.Data.SqlClient libraries. Here is a bit of pseudo-code:
String file = "C:\\db.mdf";
String connString = #"Data Source=.\SQLExpress;AttachDbFilename="C:\db.mdf";Integrated Security=True;User Instance=True";
SqlConnection conn = new SqlConnection(connString);
conn.Open();
DateTime start = DateTime.MinValue;
DateTime end = DateTime.MaxValue;
while()
{
// This should query for 1 hour at a time (but I should be able to change the time interval)
// I would like for the memory usage to be proportional to the time interval
String query = "SELECT * From MyTable WHERE Date BETWEEN '" + start.ToString() + "' AND '" + end.ToString() + "'";
SqlCommand cmd = new SqlCommand(query, conn);
SqlDataReader reader = command.ExecuteReader();
while(reader.Read())
ProcessRecord(ref reader);
start = end;
end = end.AddHours(1);
}
conn.Close();
C#
.NET 3.5
SQL Server 2008
Thanks.

This is normal, SQL Server will use all available memory unless configured differently.
Sql Server Express will release the memory when your other applications request more, but it will try to use all the memory it can to cache query plans and data.
Quote from the linked article:
The following example sets the max server memory option to 4 GB:
exec sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
exec sp_configure 'max server memory', 4096;
GO
RECONFIGURE;
GO
exec sp_configure 'show advanced options', 0;
RECONFIGURE;
GO
Do note that SqlConnection, SqlCommand and SqlDataReader implement IDisposable, so you usually would want to wrap them in a using clause.

Filip's answer is correct, that that's how SQL Server is set up to behave.
The reason that's the default is typically database servers are run on a dedicated machine that's running almost nothing except for the database and the #1 concern is database speed. You usually want to keep as much as possible in memory so it minimizes how often it needs to hit the disk.
An alternative to programatically configuring the setting, you can use Sql Server Management Studio (SSMS) to change the configuration. Connect to your database, right-click on it and go to Properties. In the Memory page you can configure the maximum memory the database will use.

A SqlDataReader will stream the results. As long as you don't hold on to the data returned by a record from the reader, the .NET garbage collector will collect all that data (at undeterministic times). In other words, your while(reader.Read()) ProcessRecord(reader); will work just fine. .NET will not load the complete set into memory (unless you do so explicitly, for instance by using a DataSet or DataTable).

Related

Why insert statement is slow with SqlCommand?

I am using SqlCommand to insert multiple records to the database but it takes long time to insert 2000 records, I did the following code:
using (SqlConnection sql = new SqlConnection(connectionString))
{
using (SqlCommand cmd = new SqlCommand(query, sql))
{
sql.Open();
int ff = 0;
while (ff < 2000)
{
cmd.ExecuteNonQuery();//It takes 139 milliseconds approximately
ff++;
Console.WriteLine(ff);
}
}
}
But when I execute the following script in SSMS(Sql Server Management Studio) the 2000 records are stored in 15 seconds:
declare #i int;
set #i=1
while (#i)<=2000
begin
set #i=#i+1
INSERT INTO Fulls (ts,MOTOR,HMI,SET_WEIGHT) VALUES ('2018-07-04 02:56:57','0','0','0');
end
What's going on?
Why is It so slow in executing the sentence?
Additional:
-The database is a SQL Database hosted in Microsoft Azure.
-The loading speed of my internet is 20 Mbits.
-The above query is not the real query, the real query contains 240 columns and 240 values.
-I tried to do a transaction following this example: https://msdn.microsoft.com/en-us/library/86773566(v=vs.110).aspx
-The sql variable is of type SqlConnection.
Thanks for your help.

It sounds like there is a high latency between your SQL server and your application server. When I do this locally, the pure TSQL version runs in 363ms, and the C# version with 2000 round trips takes 1061ms (so: about 0.53ms per round-trip). Note: I took the Console.WriteLine away, because I didn't want to measure how fast Console isn't!
For 2000 inserts, this is a pretty fair comparison. If you're seeing something massively different, then I suspect:
your SQL server is horribly under-powered - it should not take 15s (from the question) to insert 2000 rows, under any circumstances (my 363ms timing is on my desktop PC, not a fast server)
as already suggested; you have high latency
Note there are also things like "DTC" which might impact the performance based on the connection string and ambient transactions (TransactionScope), but I'm assuming those aren't factors here.
If you need to improve the performance here, the first thing to do would be to find out why it is so horribly bad - i.e. the raw server performance is terrible, and the latency is huge. Neither of those is a coding question: those are infrastructure questions.
If you can't fix those, then you can code around them. Table valued parameters or bulk-insert (SqlBulkCopy) both provide ways to transfer multiple rows without having to pay a round-trip per execute. You can also use "MARS" (multiple active results sets) and pipelined inserts, but that is quite an advanced topic (and most people tend to recommend not enabling MARS).

Make sure to minimize number of indexes on your table and use SqlBulkCopy as below
DataTable sourceData=new DataTable();
using (var sqlBulkCopy = new SqlBulkCopy(_connString))
{
sqlBulkCopy .DestinationTableName = "DestinationTableName";
sqlBulkCopy .WriteToServer(sourceData);
}

SqlDataReader and SqlDataAdapter in CLR stored procedure only return first row

I have a CLR stored procedure running on SQL Server 2014. When I execute the following code, the data reader only returns the top row of the result set. The SQL, when ran by itself, returns more than one row. I have also tried filling a DataTable with the SqlDataAdapter, but still only get one row.
using (SqlConnection conn = new SqlConnection("context connection=true"))
using (SqlCommand cmd = new SqlCommand("SELECT * FROM some_table", conn))
{
cmd.CommandType = CommandType.Text;
conn.Open();
SqlDataReader reader = cmd.ExecuteReader();
if (reader.HasRows)
{
while (reader.Read())
SqlContext.Pipe.Send(reader.GetInt32(0).ToString());
}
reader.Close();
conn.Close();
}
Thank you for any help in advance. This truly has me baffled, as it is the simplest of things.

There is nothing inherently wrong with this code. I ran it myself on SQL Server 2014, compiled against .NET Framework version 4.5.2 and it works as you are expecting it to. And just to be clear, for what you are doing here, the version of SQL Server and .NET Framework don't really matter.
I also tried passing in CommandBehavior.SingleRow to cmd.ExecuteReader() and that still returned all rows.
Since this code does work, here are some things to check:
Make sure that you are publishing to the same DB (and Instance) that you are running the SELECT statement in in SSMS.
Make sure that you have published the most recent version of this code to the DB.
Other external factors
Also, please create the SqlDataReader within a using() construct as it is a Disposable object.
UPDATE FROM O.P.:
The SQLCLR Stored Procedure is being called from a T-SQL Stored Procedure that had issued SET ROWCOUNT 1; prior to calling the SQLCLR Stored Procedure. Hence, only 1 row could be returned by any query. The 1 row problem was fixed by issuing SET ROWCOUNT 0; prior to calling the SQLCLR Stored Procedure.
Please note: it is generally preferred to use the TOP () clause instead of SET ROWCOUNT n;.

Beware! Microsoft have a bug here:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/83b23289-a24b-4f82-b81f-3e8ccf6d8001/is-this-serious-sqlclr-bug-present-in-sql-server-2012?forum=sqlnetfx
The SqlDataReader class actually briefly grabs a lock yet it should not because SQL CLR approved code should never take locks.
I submitted a sample to MS that proved this, a very simple loop using only approved SQL/CLR classes/methods would cause an AppDomain unload when an SSMS user hits cancel - totally catastrophic.
Here's the reference source, take a look and you'll find "lock" statements - illegal for SQL/CLR yet they approved this class!
https://referencesource.microsoft.com/#system.data/system/Data/SqlClient/SqlDataReader.cs
This says/implies its approved:
https://msdn.microsoft.com/en-us/library/ms131094.aspx
Yet this fails to include it on the most recent approved list:
https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration/database-objects/supported-net-framework-libraries
(Oddly OracleClient is on the list !)

SQL Transaction with ADO.Net

I am new to Database interection with C#, I am trying to writing 10000 records in database in a loop with the help of SqlCommand and SqlConnection objects with the help of SqlTransaction and committing after 5000. It is taking 10 seconds to processed.
SqlConnection myConnection = new SqlConnection("..Connection String..");
myConnection.Open();
SqlCommand myCommand = new SqlCommand();
myCommand.CommandText = "exec StoredProcedureInsertOneRowInTable Param1, Param2........";
myCommand.Connection = myConnection;
SqlTransaction myTrans = myConnection.Begintransaction();
for(int i=0;i<10000;i++)
{
mycommand.ExecuteNonQuery();
if(i%5000==0)
{
myTrans.commit();
myTrans = myConnection.BeginTransaction();
mycommand.Transaction = myTrans;
}
}
Above code is giving me only 1000 rows write/sec in database.
But when i tried to implement same logic in SQL and execute it on Database with SqlManagement Studio the it gave me 10000 write/sec.
When I compare the behaviour of above two approch then it showes me that while executing with ADO.Net there is large number of Logical reads.
my questions are:
1. Why there is logical reads in ADO.Net execution?
2. Is tansaction have some hand shaking?
3. Why they are not available in case of management studio?
4. If I want very fast insert transactions on DB then what will be the approach? .
Updated Information about Database objects
Table: tbl_FastInsertTest
No Primary Key, Only 5 fields first three are type of int (F1,F2,F3) and last 2(F4,F5) are type varchar(30)
storedprocedure:
create proc stp_FastInsertTest
{
#nF1 int,
#nF2 int,
#nF3 int,
#sF4 varchar(30),
#sF5 varchar(30)
}
as
Begin
set NoCOUNT on
Insert into tbl_FastInsertTest
{
[F1],
[F2],
[F3],
[F4],
[F5]
}
Values
{
#nF1,
#nF2,
#nF3,
#sF4,
#sF5,
} end
--------------------------------------------------------------------------------------
SQL Block Executing on SSMS
--When I am executing following code on SSMS then it is giving me more than 10000 writes per second but when i tried to execute same STP on ADO than it gave me 1000 to 1200 writes per second
--while reading no locks
begin trans
declare #i int
set #i=0
While(1<>0)
begin
exec stp_FastInsertTest 1,2,3,'vikram','varma'
set #i=#i+1
if(#i=5000)
begin
commit trans
set #i=0
begin trans
end
end

If you are running something like:
exec StoredProcedureInsertOneRowInTable 'blah', ...
exec StoredProcedureInsertOneRowInTable 'bloop', ...
exec StoredProcedureInsertOneRowInTable 'more', ...
in SSMS, that is an entirely different scenario, where all of that is a single batch. With ADO.NET you are paying a round-trip per ExecuteNonQuery - I'm actually impressed it managed 1000/s.
Re the logical reads, that could just be looking at the query-plan cache, but without knowing more about StoredProcedureInsertOneRowInTable it is impossible to comment on whether something query-specific is afoot. But I suspect you have some different SET conditions between SSMS and ADO.NET that is forcing it to use a different plan - this is in particular a problem with things like persisted calculated indexed columns, and columns "promoted" out of a sql-xml field.
Re making it faster - in this case it sounds like a table-valued parameters is exactly the thing, but you should also review the other options here

For performant inserts take a look at SqlBulkCopy class if it works for you it should be fast.
As Sean said, using parameterized queries is always a good idea.
Using a StringBuilder class, batching thousand INSERT statements in a single query and committing the transaction is a proven way of inserting data:
var sb=new StringBuilder();
for(int i=0;i < 1000;i++)
{
sb.AppendFormat("INSERT INTO Table(col1,col2)
VALUES({0},{1});",values1[i],values2[i]);
}
sqlCommand.Text=sb.ToString();
Your code doesn't look right to me, you are not committing transactions at each batch. Your code keeps opening new transactions.
It is always a good practice to drop indexes while inserting a lot of data, and adding them later. Indexes will slow down your writes.
Sql Management Studio does not have transactions but Sql has, try this:
BEGIN TRANSACTION MyTransaction
INSERT INTO Table(Col1,Col1) VALUES(Val10,Val20);
INSERT INTO Table(Col1,Col1) VALUES(Val11,Val21);
INSERT INTO Table(Col1,Col1) VALUES(Val12,Val23);
COMMIT TRANSACTION

You need to use a parameterized query so that the execution path can get processed and cached. Since you're using string concatenation (shudder, this is bad, google sql injection) to build the query, SQL Server treats those 10,000 queries are separate, individual queries and builds an execution plan for each one.
MSDN: http://msdn.microsoft.com/en-us/library/yy6y35y8.aspx although you're going to want to simplify their code a bit and you'll have to reset the parameters on the command.
If you really, really want to get the data in the db fast, think about using bcp... but you better make sure the data is clean first (as there's no real error checking/handling on it.

Searching SQL Server with LIKE Operator

I have a problem when I try to read rows from SQL Server 2005 from code in C#
The idea:
In my database (SQL Server 2005 Express) there is a table with a column (of datatype ntext) containing HTML code.
In my C# application user can enter a sentence (HTML code) and search the rows with contains this sentence.
The query generated from my app is:
USE test
SELECT
al.aal_Id As ID,
al.aal_Description As Opis,
au.au_Title As Tytul_szablonu,
au.au_Note As Nazwa_szablonu
FROM dbo.au_Allegro al
LEFT OUTER JOIN dbo.au__Auction au ON (al.aal_AuctionId = au.au_Id)
WHERE
au.au_Type = 11
AND al.aal_Description COLLATE SQL_Latin1_General_CP1_CS_AS LIKE '%%' ESCAPE '\'
In my App I'm converting special characters (e.g. ',) and adding escape character.
User tries to search for very long sentence (about 7000+ chars), when he tries to do this the sqlserver.exe process consumes all of his RAM memory and search time is about 30+ minutes (he has about 1000+ rows in this table).
The query returns 0 rows.
When he tries to run (this same) query in SQL Server Management Studio the database shows results in few seconds (with rows).
In my app I use SqlDataAdapter:
System.Data.DataTable dt = new System.Data.DataTable();
System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand();
cmd.CommandTimeout = 0;
cmd.Connection = conn;
System.Data.SqlClient.SqlDataAdapter da = new System.Data.SqlClient.SqlDataAdapter(kwerenda, conn);
try
{
da.Fill(dt);
}
I tried SqlDataReader:
dr = cmd.ExecuteReader();
while (dr.Read())
{
string id = dr["ID"].ToString();
string opis = dr["Opis"].ToString();
string tytul = dr["Tytul_szablonu"].ToString();
string nazwa = dr["Nazwa_szablonu"].ToString();
dt.Rows.Add(id, opis, tytul, nazwa);
}
When I tried to simulate this in my test database I don't have any problems with search (this same) sentences.
Have you got any tips for me ?
I can't do any changes in user datatable, i can't go to him and check what happens.

Is the SQL command executing a stored procedure? If so you might be getting different query plans, which may explain the timing difference between the apps. Your ADO.Net call might be affected by something known as parameter sniffing, which can cause radically different query execution times.
There are a couple of things you can do to avoid this problem and yield consistent results.
Convert parameters to local variables inside of the stored procedure.
Disable the feature on the SQL server altogether.
Also your syntax looks suspect as John pointed out. It would be better to use a NVARCHAR(MAX) datatype for that column if possible NTEXT should be avoided as its been deprecated.
A better alternative to doing like searches on a non-indexed column like this is to utilize the SQL's Full Text Search which is optimized for these types of queries.
http://msdn.microsoft.com/en-us/library/ms142571.aspx
http://www.developer.com/article.php/3446891

A couple of things you might want to do.
First, don't use nText. SQL 2005 has a datatype called nvarchar(max). It's MUCH better for storing large amounts of text. Further, ntext was deprecated so save yourself some trouble and convert it now. See this link on how to successfully do this.
Second, the query you posted is unusual. You have a left outer join, but you have a where clause on the outer joined table. Because of the where clause it's being converted (hopefully) into an inner join. You should just write it that way OR move the au.au_type = 11 to be part of the join construct. I doubt you want the latter.
Third, when the client runs the query the first time through your app it is generating a query plan based on those parameters. Running the exact same query shortly thereafter in Management Studio is going to reuse that plan and cached data. Therefore the second pass will be fast so no surprise there.
Fourth, I don't think you posted the actual query that was run. I suspect there is some data in the parameter you are comparing which either isn't escaping properly OR is using one of the reserved characters such as '[', ']', ^, etc.

How to duplicate a SQL Server 2000 table programatically using .NET 2.0?

I want to backup a table saving the copy in the same database with another name. I want to do it programatically using .NET 2.0 (preferably C#). Someone can point me what should I do?

Just send this query to the server:
SELECT * INTO [BackupTable] FROM [OriginalTable]
This will create the backup table from scratch (an error will be thrown if it already exists). For large tables be prepared for it to take a while. This should mimic datatypes, collation, and NULLness (NULL or NOT NULL), but will not copy indexes, keys, or similar constraints.
If you need help sending sql queries to the database, that's a different issue.

One way to do this would be to simply execute a normal query this way using INTO in SQL:
SELECT *
INTO NewTableName
FROM ExistingTableName
This automatically creates a new table and inserts the rows of the old one.
Another way would be to use SqlBulkCopy from the System.Data.SqlClient namespace. There is a nice CodeProject article explaining how to do this:
SQL Bulk Copy with C#.Net
Programmers usually need to transfer
production data for testing or
analyzing. The simplest way to copy
lots of data from any resources to SQL
Server is BulkCopying. .NET Framework
2.0 contains a class in ADO.NET "System.Data.SqlClient" namespace:
SqlBulkCopy. The bulk copy operation
usually has two separated phases.
In the first phase you get the source
data. The source could be various data
platforms such as Access, Excel, SQL..
You must get the source data in your
code wrapping it in a DataTable, or
any DataReader class which implements
IDataReader. After that, in the second
phase, you must connect the target SQL
Database and perform the bulk copy
operation.
The bulk copy operation in .Net is a
very fast way to copy large amount of
data somewhere to SQL Server. The
reason for that is the Bulkcopy Sql
Server mechanism. Inserting all data
row by row, one after the other is a
very time and system resources
consuming. But the bulkcopy mechanism
process all data at once. So the data
inserting becomes very fast.
The code is pretty straightforward:
// Establishing connection
SqlConnectionStringBuilder cb = new SqlConnectionStringBuilder();
cb.DataSource = "SQLProduction";
cb.InitialCatalog = "Sales";
cb.IntegratedSecurity = true;
SqlConnection cnn = new SqlConnection(cb.ConnectionString);
// Getting source data
SqlCommand cmd = new SqlCommand("SELECT * FROM PendingOrders",cnn);
cnn.Open();
SqlDataReader rdr = cmd.ExecuteReader();
// Initializing an SqlBulkCopy object
SqlBulkCopy sbc = new SqlBulkCopy("server=.;database=ProductionTest;" +
"Integrated Security=SSPI");
// Copying data to destination
sbc.DestinationTableName = "Temp";
sbc.WriteToServer(rdr);
// Closing connection and the others
sbc.Close();
rdr.Close();
cnn.Close();

You could use the SQL Server Management Objects (SMO). You could make a copy of a database (data and schema). There are a lot of options you can set. The following example copies the entire database:
// Connect to the server
Server server = new Server(".");
// Get the database to copy
Database db = server.Databases["MyDatabase"];
// Set options
Transfer transfer = new Transfer(db);
transfer.CopyAllObjects = true;
transfer.DropDestinationObjectsFirst = true;
transfer.CopySchema = true;
transfer.CopyData = true;
transfer.DestinationServer = ".";
transfer.DestinationDatabase = "MyBackupDatabase";
transfer.Options.IncludeIfNotExists = true;
// Transfer Schema and Data
transfer.TransferData();
You can find the documentation of the Transfer Class on MSDN.

Depending on how many records in the table this could be a very bad idea to do from C# and the user interface.
For a small table
use the following SQL
Create table table2 (field1 int, field2 varchar(10)) --use the actual field names and datatypes of course)
insert into table2 (field1, field2)
select field1, field2 from table1
I suggest the create table to create it once and then the insert so that you can add records to the table multiple times. Select into only will work once.

At the very least, you could do "SELECT * INTO NEWTable FROM OldTable".
Do you want to create all the indexes/constraints etc?
EDIT: Adding to splattne's comments, you will have to get the handle to Table instance of the table you wish to copy. Use the Script method to get the script it will generate. Modify the script string to replace old names with new names & run it on the DB.
EDIT2: http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.table.table.aspx
http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.tableviewtabletypebase.script.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.