I have a local SQL Server 2017 database, and I need to copy two tables to an Azure SQL Server database; one table has over 100 million rows of data including a "geography" type column. How do I do that?
I am right now running a bulk copy:
using (SqlConnection streamsConnection = new SqlConnection(streamsConnectionString))
{
streamsConnection.Open();
using (SqlConnection cloudConnection = new SqlConnection(cloudConnectionString))
{
cloudConnection.Open();
using (SqlCommand cmd = streamsConnection.CreateCommand())
using (SqlBulkCopy bcp = new SqlBulkCopy(cloudConnection))
{
bcp.DestinationTableName = "GroundDataNodes";
bcp.BatchSize = 200000;
bcp.BulkCopyTimeout = 1200;
bcp.NotifyAfter = 100000;
bcp.SqlRowsCopied += new SqlRowsCopiedEventHandler(s_SqlRowsCopied);
cmd.CommandText = "SELECT [Id],[nodeid],[latlon],[type],[label],[code],[lat],[lon]FROM[dbo].[GroundDataNodes]";
using (SqlDataReader reader = cmd.ExecuteReader())
{
bcp.WriteToServer(reader);
}
Console.WriteLine("Finished!");
Console.ReadLine();
}
}
}
But I'm quite new to the bulk load side and am wondering how I can improve this so it doesn't take weeks to run...
Give Azure Database Migration Services a Try. I Migrated an on-premise SQL Server with 10 Million Rows in a Days Time. But of course it also largely depends on your bandwidth.
Also:
Use the multi CPU General Purpose Pricing Tier when you create your service instance to allow the service to take advantage of multiple vCPUs for parallelization and faster data transfer.
Temporarily scale up your Azure SQL Database target instance to the Premium tier SKU during the data migration operation to minimize Azure SQL Database throttling that may impact data transfer activities when using lower-level SKUs.
so I have tried all sorts of upload stuff like the bulk upload, migration, backup etc but all had the same problem and that was my upload speed isn't up to it. It would work but take days to run. So I decided to write a server side bit of code to just populate it from there directly onto the database therefore taking my upload speeds out of the equation. I imagine if I had better upload speeds the migration tool would have worked fine even with the Geographic field etc. not quick but it would work.
Related
I have a query to a DB2 database on an iSeries that I wish to run from a C# program. The query is:
SELECT C.CLINO, SUM(T.VALUE) AS VALUE
FROM GTDATA.CLIENT AS C
INNER JOIN TABLE(GETHOLDDTL(C.CLINO,20150630,'','Y','Y','','','','','','','','','','M')) AS T ON T.CLIENT = C.CLINO
GROUP BY C.CLINO
GETHOLDDTL is an SQL function supplied to us by a third party to allow us to retrieve data from their system in our own programs.
When I run the query in the iSeries interactive SQL environment, saving the output to a file to ensure that I'm getting a time for the execution over the whole dataset, it takes approximately 1.5 hours to run. This is fine, as there's a lot of data which is being calculated on the fly. The problem is that when I run my C# program, the process takes in excess of 24 hours (ie that's the point at which I gave up, when it hadn't got past this point the morning after I started execution).
The C# code I'm using to execute the query is:
var conn = new OdbcConnection();
conn.ConnectionString = #"FILEDSN=S:\qsys2.dsn;UID=XXXXXXX;PWD=XXXXXXXX";
conn.ConnectionTimeout = 0;
conn.Open();
var com = new OdbcCommand(selectCommand, conn);
var reader = com.ExecuteReader();
var dt = new DataTable();
dt.Load(reader);
conn.Close();
The delay is happening on the dt.Load() line - this is presumably where the query is actually executed?
What might account for the difference? Is there a better way to implement this? If necessary, I can produce the data using the interactive SQl environment, and change my C# code to use the file that produces, but I'm trying to get this to run without any user interaction so I'd like to avoid that if possible.
Thanks for any help.
I am developing a web site, it uses SQL Server 2008 R2 Express for its database. And in testing, there is a lot of data and images stored into this database.
According to wiki, the SQL Server Express edition has a 10 GB size limit. When I insert data and reach the limit, what exception will be thrown? Or, how do I detect the approaching limit problem by codes ?
I use EF 5 with code-first approach to insert large data set.
In tests I have seen that:
sp_spaceused
won't work as expected, it showed 12GB after deleting lots of records. And the other answers regarding query sys.databases were not clear enough to me.
Searching around I found a very good explanation regarding SQL Server 2012 Express Edition 10GB Size Limit on Ramons weblog [EDIT2018 updated link]
SELECT
[name] AS [Filename],
[size]/128.0 AS [Filesize],
CAST(FILEPROPERTY([name],'SpaceUsed') AS int)/128.0 AS [UsedSpaceInMB],
[size]/128.0 - CAST(FILEPROPERTY([name],'SpaceUsed') AS int)/128.0 AS [AvailableSpaceInMB],
[physical_name] AS [Path]
FROM sys.database_files
"... space includes the transaction log and it also includes all unused space within these files. .... SQL Server Express will start complaining when it cannot reserve any more space for the datafile."
So checking
CAST(FILEPROPERTY([name],'SpaceUsed') AS int)/128.0 AS [UsedSpaceInMB]
seems to be the best option.
In combination with EF in c# my request to the DB looks like
string sqlSelect = "SELECT CAST(FILEPROPERTY([name],'SpaceUsed') AS int)/128.0 AS [UsedSpaceInMB] FROM sys.database_files";
var dbResult = dbInstance.Database.SqlQuery<Decimal>(sqlSelect).FirstOrDefault();
double spaceUsedInGb = Convert.ToDouble(dbResult)/1024;
Execute this SQL command, and it will reveal the disk-space usage of current database.
sp_spaceused
It also can be used to query the space usage of specific table. This link provides useful information about this problem.
To check the database size query:
sys.databases
Just query this, perhaps with C# or if you use SSMS (sql server management studio) shell, you can schedule a job that emails you or whatever you want.
Example:
SQL Server 2008: How to query all databases sizes?
Edit: NOT sure if error is thrown, it should log to event log or a sql log...
Side note:
Developer version is only $50 and holds same as Datacenter which hold 524 PB
http://technet.microsoft.com/en-us/library/cc645993%28v=sql.105%29.aspx
To Check the Size of the Database Two Ways:
/* new school way - data plus log and run in the local db that you want to see
here you can see the log and the mdf file.
*/
SELECT size*8.0/1024.0 as size_in_gb, *
FROM sys.database_files
GO
/* old school way, run for all db size*/
sp_helpdb
FYI - the MDF and NDF files are the only ones that attribute to the file size exceeding 10GB.
I am using the following method to calculate database current size crucial for comparing with sql size limitations:
public static int GetDbSizeInMB([NotNull] string connectionString) {
using (SqlConnection sqlConnection = new SqlConnection(connectionString)) {
sqlConnection.Open();
using (var sqlCommand = new SqlCommand()) {
sqlCommand.CommandType = CommandType.Text;
sqlCommand.CommandText = #"
SELECT SUM(CAST(FILEPROPERTY([name],'SpaceUsed') AS int)/128.0) AS [UsedSpaceInMB]
FROM sys.database_files
WHERE type_desc like 'ROWS' or type_desc like 'FULLTEXT'
";
sqlCommand.Connection = sqlConnection;
return Convert.ToInt32(sqlCommand.ExecuteScalar());
}
}
)
In a windows mobile application I am calling a web service to retrieve large amounts of data which returns data in the form of array List. After that I am inserting the data to SQL Server CE database inside the device. Right now its taking too much time as there are lot of tables and large amount of data for each table. Please suggest a faster way to insert data to SQL Server CE database using Array list .
ce_command.Connection = Database.GetDbConnection();
ce_command.CommandType = CommandType.TableDirect;
ce_command.CommandText = "NHH_SOURCE";
System.Data.SqlServerCe.SqlCeResultSet rsSource;
SqlCeUpdatableRecord recSource;
rsSource = ce_command.ExecuteResultSet(System.Data.SqlServerCe.ResultSetOptions.Updatable);
recSource = rsSource.CreateRecord();
NPFWebService.WebServiceGetNHH_Source[] get_source = null;
get_source = npf_WS.GetSourceData();
if (get_source.Length > 0)
{
for (int i = 0; i < get_source.Length; i++)
{
recSource.SetValue(0, get_source[i].sourceID);
recSource.SetValue(1, get_source[i].sourceName.Replace("'", "''"));
recSource.SetValue(2, get_source[i].organizationID);
recSource.SetValue(3,get_source[i].transferF);
recSource.SetValue(4, get_source[i].transferDate);
rsSource.Insert(recSource);
Looks like your are already using the fastest approach. Does you table have an index (or indexes) - you might be able to save some time by dropping and recreating after the insert is complete
I'm using SMO to execute a batch SQL script. In Management Studio, the script executes in about 2 seconds. With the following code, it takes about 15 seconds.
var connectionString = GetConnectionString();
// need to use master because the DB in the connection string no longer exists
// because we dropped it already
var builder = new SqlConnectionStringBuilder(connectionString)
{
InitialCatalog = "master"
};
using (var sqlConnection = new SqlConnection(builder.ToString()))
{
var serverConnection = new ServerConnection(sqlConnection);
var server = new Server(serverConnection);
// hangs here for about 12 -15 seconds
server.ConnectionContext.ExecuteNonQuery(sql);
}
The script creates a new database and inserts a few thousand rows across a few tables. The resulting DB size is about 5MB.
Anyone have any experience with this or have a suggestion on why this might be running so slowly with SMO?
SMO does lots of weird .. stuff in the background, which is a price you pay for ability to treat server/database objects in an object-oriented way.
Since you're not using the OO capabilites of SMO, why don't you just ignore SMO completely and simply run the script through normal ADO?
The best and fastest way to upload records into a database is through SqlBulkCopy.
Particularly when your scripts are ~1000 records plus - this will make a significant speed improvement.
You will need to do a little work to get your data into a DataSet, but this can easily be done using the DataSet xml functions.
I am trying to set up a synchronization routine in C# to send data from a ms access database to a sql server. MS Access is not my choice it's just the way it is.
I am able to query the MS Access database and get OleDbDataReader record set. I could potentially read each individual record and insert it onto SQL Server but it seems so wasteful.
Is there a better way to do this. I know I could do it in MS Access linking to sql server and perform the update easy but this is for end users and I don't want them messing with access.
EDIT:
Just looking at SqlBulkCopy I think that may be the answer if I get my results into DataRow[]
I found a solution in .NET that I am very happy with. It allows me to give the access to the sync routine to any user within my program. It involves the SQLBulkCopy class.
private static void BulkCopyAccessToSQLServer
(CommandType commandType, string sql, string destinationTable)
{
using (DataTable dt = new DataTable())
{
using (OleDbConnection conn = new OleDbConnection(Settings.Default.CurriculumConnectionString))
using (OleDbCommand cmd = new OleDbCommand(sql, conn))
using (OleDbDataAdapter adapter = new OleDbDataAdapter(cmd))
{
cmd.CommandType = commandType;
cmd.Connection.Open();
adapter.SelectCommand.CommandTimeout = 240;
adapter.Fill(dt);
adapter.Dispose();
}
using (SqlConnection conn2 = new SqlConnection(Settings.Default.qlsdat_extensionsConnectionString))
{
conn2.Open();
using (SqlBulkCopy copy = new SqlBulkCopy(conn2))
{
copy.DestinationTableName = destinationTable;
copy.BatchSize = 1000;
copy.BulkCopyTimeout = 240;
copy.WriteToServer(dt);
copy.NotifyAfter = 1000;
}
}
}
}
Basically this puts the data from MS Access into a DataTable it then uses the second connection conn2 and the SqlBulkCopy class to send the data from this DataTable to the SQL Server. It's probably not the best code but should give anyone reading this the idea.
You should harness the power of SET based queries over RBAR efforts.
Look into a SSIS solution to synchronize the data and then schedule the package to run at regular intervals using SQL Server Agent.
You can call an SSIS package from the command line so you can effectively do it from MS Access or from C#.
Also, the SQL Server, the MS Access DB and the SSIS package do not have to be on the same machine. As long as your calling program can see the SSIS package, and the package can connect to the SQL Server and the MS Access DB, you can transfer data from one place to another.
It sounds like what you are doing is ETL. There are several tools that are built to do this and to me, there is little reason to reinvent the functionality. You have SQL Server, therefore you have SSIS. It has a ton of tools for automated transformations, cleanups, lookups, etc. that you can use out of the box.
Unless this is a real cut-and-dry data load and there is absolutely no scope for the complexity of the upload to increase later on (yeah, right!) I would go with a tried and tested ETL tool.
If SQL Server Integration Services isn't an option, you could write out to a temporary text file the data that you read from Access and then call bcp.exe to load it to the database.
I have done something like this before.
I used
OleDbConnection aConnection = new OleDbConnection(String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}", fileName));
aConnection.Open();
to open the access db. Then
OleDbCommand aCommand = new OleDbCommand(String.Format("select * from {0}", accessTable), aConnection);
OleDbDataReader aReader = aCommand.ExecuteReader();
to execute the read from the table. Then
int fieldCount = aReader.FieldCount;
to get the field count
while (aReader.Read())
to loop the records and
object[] values = new object[fieldCount];
aReader.GetValues(values);
to retrieve the values.
There are several ways to sync but it can give a problem when you change a field name in sql server or add a new column or delete. The best option would be:
Create connections for sql server and oledb.
Write custom query to fetch record from one connection and save it to another.
Before executing make sure to program in a way that you update all table definitions.
In my case this helped in a way because the load on sql server became down.
can you not transfer Access file to the server and delete it once sync is complete?
You can create windows service for that..