I couldn't fetch more than 700000 rows from SQL Server using C# - I get a "out-of-memory" exception. Please help me out.
This is my code:
using (SqlConnection sourceConnection = new SqlConnection(constr))
{
sourceConnection.Open();
SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);
reader = commandSourceData.ExecuteReader();
}
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
{
bulkCopy.DestinationTableName = "destinationTable";
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
reader.Close();
}
}
I have made up small console App based on the given solution 1 but ends up with same exception also i have posted my Memory process Before and After
Before Processing:
After adding the command timeout at the read code side, Ram Peaks up,
That code should not cause an OOM exception. When you pass a DataReader to SqlBulkCopy.WriteToServer you are streaming the rows from the source to the destination. Somewhere else you are retaining stuff in memory.
SqlBulkCopy.BatchSize controls how often SQL Server commits the rows loaded at the destination, limiting the lock duration and the log file growth (if not minimally logged and in simple recovery mode). Whether you use one batch or not should have no impact on the amount of memory used either in SQL Server or in the client.
Here's a sample that copies 10M rows without growing memory:
using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace SqlBulkCopyTest
{
class Program
{
static void Main(string[] args)
{
var src = "server=localhost;database=tempdb;integrated security=true";
var dest = src;
var sql = "select top (1000*1000*10) m.* from sys.messages m, sys.messages m2";
var destTable = "dest";
using (var con = new SqlConnection(dest))
{
con.Open();
var cmd = con.CreateCommand();
cmd.CommandText = $"drop table if exists {destTable}; with q as ({sql}) select * into {destTable} from q where 1=2";
cmd.ExecuteNonQuery();
}
Copy(src, dest, sql, destTable);
Console.WriteLine("Complete. Hit any key to exit.");
Console.ReadKey();
}
static void Copy(string sourceConnectionString, string destinationConnectionString, string query, string destinationTable)
{
using (SqlConnection sourceConnection = new SqlConnection(sourceConnectionString))
{
sourceConnection.Open();
SqlCommand commandSourceData = new SqlCommand(query, sourceConnection);
var reader = commandSourceData.ExecuteReader();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
{
bulkCopy.BulkCopyTimeout = 60 * 10;
bulkCopy.DestinationTableName = destinationTable;
bulkCopy.NotifyAfter = 10000;
bulkCopy.SqlRowsCopied += (s, a) =>
{
var mem = GC.GetTotalMemory(false);
Console.WriteLine($"{a.RowsCopied:N0} rows copied. Memory {mem:N0}");
};
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
}
}
}
}
Which outputs:
. . .
9,830,000 rows copied. Memory 1,756,828
9,840,000 rows copied. Memory 798,364
9,850,000 rows copied. Memory 4,042,396
9,860,000 rows copied. Memory 3,092,124
9,870,000 rows copied. Memory 2,133,660
9,880,000 rows copied. Memory 1,183,388
9,890,000 rows copied. Memory 3,673,756
9,900,000 rows copied. Memory 1,601,044
9,910,000 rows copied. Memory 3,722,772
9,920,000 rows copied. Memory 1,642,052
9,930,000 rows copied. Memory 3,763,780
9,940,000 rows copied. Memory 1,691,204
9,950,000 rows copied. Memory 3,812,932
9,960,000 rows copied. Memory 1,740,356
9,970,000 rows copied. Memory 3,862,084
9,980,000 rows copied. Memory 1,789,508
9,990,000 rows copied. Memory 3,903,044
10,000,000 rows copied. Memory 1,830,468
Complete. Hit any key to exit.
NB: Per DavidBrowne's answer, it seems I'd misunderstood how the batching of the SqlBulkCopy class works. The refactored code may still be useful to you, so I've not deleted this answer (as the code is still valid), but the answer is not to set the BatchSize as I'd believed. Please see David's answer for an explanation.
Try something like this; the key being setting the BatchSize property to limit how many rows you deal with at once:
using (SqlConnection sourceConnection = new SqlConnection(constr))
{
sourceConnection.Open();
SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);
using (reader = commandSourceData.ExecuteReader() { //add a using statement for your reader so you don't need to worry about close/dispose
//keep the connection open or we'll be trying to read from a closed connection
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
{
bulkCopy.BatchSize = 1000; //Write a few pages at a time rather than all at once; thus lowering memory impact. See https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy.batchsize?view=netframework-4.7.2
bulkCopy.DestinationTableName = "destinationTable";
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
throw; //we've caught the top level Exception rather than somethign specific; so once we've logged it, rethrow it for a proper handler to deal with up the call stack
}
}
}
}
Note that because the SqlBulkCopy class takes an IDataReader as an argument we don't need to download the full data set. Instead, the reader gives us a way to pull back records as required (hence us leaving the connection open after creating the reader). When we call the SqlBulkCopy's WriteToServer method, internally it has logic to loop multiple times, selecting BatchSize new records from the reader, then pushing those to the destination table before repeating / completing once the reader has sent all pending records. This works differently to, say, a DataTable, where we'd have to populate the data table with the full set of records, rather than being able to read more back as required.
One potential risk of this approach is, because we have to keep the connection open, any locks on our source are kept in place until we close our reader. Depending on the isolation level and whether other queries are trying to access the same records, this may cause blocking; whilst the data table approach would have taken a one-off copy of the data into memory and then closed the connection, avoiding any blocks. If this blocking is a concern you should look at changing the isolation level of your query, or applying hints... Exactly how you approach that would depend on the requirements though.
NB: In reality, instead of running the above code as is, you'd want to refactor things a bit, so the scope of each method is contained. That way you can reuse this logic to copy other queries to other tables.
You'd also want to make the batch size configurable rather than hard-coded so you can adjust to a value that gives a good balance of resource usage vs performance (which will vary based on the host's resources).
You may also want to use async methods, to allow other parts of your program to progress whilst you're waiting on data to flow from/to your databases.
Here's a slightly amended version:
public Task<SqlDataReader> async ExecuteReaderAsync(string connectionString, string query)
{
SqlConnection connection;
SqlCommand command;
try
{
connection = new SqlConnection(connectionString); //not in a using as we want to keep the connection open until our reader's finished with it.
connection.Open();
command = new SqlCommand(query, connection);
return await command.ExecuteReaderAsync(CommandBehavior.CloseConnection); //tell our reader to close the connection when done.
}
catch
{
//if we have an issue before we've returned our reader, dispose of our objects here
command?.Dispose();
connection?.Dispose();
//then rethrow the exception
throw;
}
}
public async Task CopySqlDataAsync(string sourceConnectionString, string sourceQuery, string destinationConnectionString, string destinationTableName, int batchSize)
{
using (var reader = await ExecuteReaderAsync(sourceConnectionString, sourceQuery))
await CopySqlDataAsync(reader, destinationConnectionString, destinationTableName, batchSize);
}
public async Task CopySqlDataAsync(IDataReader sourceReader, string destinationConnectionString, string destinationTableName, int batchSize)
{
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
{
bulkCopy.BatchSize = batchSize;
bulkCopy.DestinationTableName = destinationTableName;
await bulkCopy.WriteToServerAsync(sourceReader);
}
}
public void CopySqlDataExample()
{
try
{
var constr = ""; //todo: define connection string; ideally pulling from config
var constr2 = ""; //todo: define connection string #2; ideally pulling from config
var batchSize = 1000; //todo: replace hardcoded batch size with value from config
var task = CopySqlDataAsync(constr, "select * from XXXX", constr2, "destinationTable", batchSize);
task.Wait(); //waits for the current task to complete / if any exceptions will throw an aggregate exception
}
catch (AggregateException es)
{
var e = es.InnerExceptions[0]; //get the wrapped exception
Console.WriteLine(e.Message);
//throw; //to rethrow AggregateException
ExceptionDispatchInfo.Capture(e).Throw(); //to rethrow the wrapped exception
}
}
Something went horribly wrong in your design if you even try to process 700k Rows in C#. That you fail at this is to be expected.
If this is data retrieval for display: There is no way the user will be able to process that amount of data. And filtering down from 700k Rows in the GUI is just a waste of time and Bandwidth. 25-100 fields at once is about the limit. Do filtering or pagination on the Query side so you do not end up retrieving orders of magnitude more then you can actually process.
If this is some form of Bulk insert or Bulk modification: Do that kind of operation in the SQL Server, not in your code. Retrieving, processing in C# and then posting back just adds layers of Overhead. If you add the 2 way Network transfer, you will easily triple the time this will take.
A program written in C# Oracle client that proved to have "Connection leak" which it is not closing all database connections and so after some time it can no longer connect to the database as there are too many open connections.
I wrote the following helper function (quite expansive):
private static int tryFindConnCount(){
var connstk = new Stack<Oracle.ManagedDataAccess.Client.OracleConnection>();
try
{
for (var i = 0; i < 10000; ++i)
{
var conn = new Oracle.ManagedDataAccess.Client.OracleConnection(
myDatabaseConnection);
conn.Open();
connstk.Push(conn);
}
}
catch(Exception e)
{
foreach (var conn in connstk)
{
conn.Close();
}
}
return connstk.Count;
}
Here is the code in a test case that uses the above:
var co = tryFindConnCount();
CodeThatMayLeakConnection();
var cn = tryFindConnCount();
Assert.That(cn, Is.EqaulTo(co));
It helped me identify at least one case that have connection leak.
The problem of tryFindConnCount is that it should never be used in production. And I think there should be some way to obtain the same value much cheaper.
How can I do this in the code so I can monitor this value in production?
Trying to find places where connections where not closed is a difficult task.
If you leave the program and forget to close the connection the last sql which was executed is stored in column SQL_ID in v$session (gv$session for RAC). You can search v$session for idle/dead sessions. You can then use v$sql to find the SQL text which may tell you more about what was done last. By this you may get a hint where to search in your code.
select a.sid, a.username, a.program, a.machine, a.sql_id, b.sql_fulltext
from v$session a, v$sql b
where b.sql_id(+) = a.sql_id
and a.username is not null -- filter system processes, maybe filter more stuff
;
You can query Oracle DB on "gv$session" view to get the info that you need. With a query on this view you can cyclically monitor the DB every 10-15 minutes for a count of connections from this program.
Example query below :
select count(*)
from gv$session
where machine = 'XXXXX'
and username = 'YYYYY'
and program = 'ZZZZZ';
You only need values that uniquely identify those connections like for example machine from which the connections originate.
Also the query is very light and doesn't add performance overhead.
I am trying to add a row to a table in a PostgreSQL database using ODBC. Although no exceptions are thrown, the row is not being added to the table. Here is my code:
void TestNewRow()
{
try
{
DataSet dataSet = new DataSet();
OdbcDataAdapter adapter = new OdbcDataAdapter();
adapter.SelectCommand =
new OdbcCommand("select read_time from plant_genie.plc_values_by_tag", m_db.GetConnection());
OdbcCommandBuilder builder =
new OdbcCommandBuilder(adapter);
adapter.Fill(dataSet);
DataTable valuesTable = dataSet.Tables[0];
DataRow newRow = valuesTable.NewRow();
newRow["read_time"] = DateTime.Now;
valuesTable.Rows.Add(newRow);
valuesTable.AcceptChanges();
dataSet.AcceptChanges();
adapter.Update(dataSet);
}
catch (Exception ex)
{
int a = 1;
}
}
I have a breakpoint in the exception handler and another at the end of the function. The second breakpoint is hit but not the first, so no exception is being thrown. I have triple-checked that I'm connecting to the correct database. I don't think I should need two AcceptChanges() calls and an Update() call, but even with all of that overkill, I'm still not getting a new row in my table. What am I doing wrong?
I tried to find a duplicate of this question, but there are so many questions about adding rows that if there was a duplicate, it is being hidden.
Thank you for your help.
RobR
Calling AcceptChanges marks all changes as accepted (i.e. it resets the state of everything to Unmodified), so no changes will be saved to the database. Remove this call to both the table and dataset and your changes should be saved.
I am having some issues with a class that I have created to perform different database commands.
1st) The program is local ran, and will only run locally. It will only ever connect to the database on the localhost.
Therefore I have a simple class setup, called databaseConnector that allows me to pass a string to it with the required Mysql query to perform the different functions.
For instance.
I use:
var db = new databaseConnector();
db.Update("UPDATE * WHERE.....");
However, it seems if I ever want to use a different query, or another query, it's not working and throwing errors.
For instance.
var db = new databaseConnector();
db.Update("UPDATE * WHERE...");
db.INSERT("INSERT INTO * WHERE.....");
Will give me an error on the insert execution. Any ideas why? I have resorted to creating it again. So I have to redo:
db = new databaseConnector();
to then use the Insert command.
Here is an example of my insert function.
public void Insert(string query)
{
//open connection
if (this.OpenConnection() == true)
{
//create command and assign the query and connection from the constructor
MySqlCommand cmd = new MySqlCommand(query, connection);
//Execute command
cmd.ExecuteNonQuery();
//close connection
this.CloseConnection();
}
}
Now that I think about it. Should I call my db.openConnection() before doing it. Since when I initialize it in the first var db = new databaseConnection(), it's opening? And then in each function it's closing it, but only checking if it's open, instead of attempting to open, doing query, then closing.
I've written a simple app (call it app1) that reads a SQLite database and display the contents in a gridview. I have a separate C# console app (app2) that needs to write to the same database. The problem is app2 fails with a "database is locked" error. I can see as soon as I start app1 a userdb-journal file is created. I assume the problem is that app1 opens the database but doesn't release it? This is the code I have for populating the Table I bind to the grid in app1.
public DataTable GetAllPeople()
{
var connectionString = "Data Source=" + dbPath + ";Version=3";
using (SQLiteDataAdapter sqlDataAdapter =
new SQLiteDataAdapter("SELECT id,FirstName,LastName,Address FROM Users",
connectionString))
{
using (DataTable dataTable = new DataTable())
{
sqlDataAdapter.Fill(dataTable);
// code to add some new columns here
return dataTable;
}
}
}
Here is the code that populates the gridview:
private void Form1_Load(object sender, EventArgs e)
{
UserDatabase db = new UserDatabase();
db.Initialize();
dataGridView1.DataSource = db.GetAllPeople();
}
How can I fix things so app2 can read and write to the database while app1 is running?
EDIT
Looks like that journal file is only created by app2. I had only noticed the database locked error when app1 was running also, but perhaps app1 is a red herring. App2 is multi-threaded. Perhaps I should start a new question focusing on app2 and multithreaded access?
EDIT
Thanks for all the comments. I've put a lock around all db accesses and wrapped everything up in usings. All seems to be working now.
Here is the code, set the parameters easily on the connection string builder, and build the SQLiteConnection with it.
SQLiteConnectionStringBuilder connBuilder = new SQLiteConnectionStringBuilder();
connBuilder.DataSource = filePath;
connBuilder.Version = 3;
connBuilder.CacheSize = 4000;
connBuilder.DefaultTimeout = 100;
connBuilder.Password = "mypass";
using(SQLiteConnection conn = new SQLiteConnection(connBuilder.ToString()))
{
//...
}
Regards.
Have you asked SQLITE to wait and try again if the db is locked? Here's how to do it in C
// set SQLite to wait and retry for up to 100ms if database locked
sqlite3_busy_timeout( db, 100 );
The point is that SQLITE locks the db briefly when it is accessed. If another thread or process accesses it while blocked, SQLITE by default returns an error. But you can make it wait and try again automatically with the above call. This solves many of these kind of problems.