Copy data from postgres to sql server with very similar schema - c#

I have data in postgres that I need to programmatically move to sql server on a scheduled basis. The tables have the all have same schema and the tables are not very relational (no foreign keys), but I think I can just use WITH NO CHECK if there are.
I have ADO.Net connectors for both servers, so I think I can just select from the postgres to a DataTable and then INSERT INTO the SQL SERVER tables. However, I'm a little concerned on performance.
My question is how hard would it be to use the Postgres COPY command to export to say CSV, and then use bcp.exe to import the CSV into SQL SERVER. Does this sound like the way to go? I can see this needing to scale to ~750k records being copied per schedule.

Related

What is the benefit of using Azure SQL elastic queries vs handling the cross DB queries in my .Net code?

I have Azure SQL database database1 in server Server1.database.windows.net
I need to retrieve some records from this database, and insert them in a table in different database on different Azure server.
Do you think for this scenario it's better to do it using .Net or to use Elastic queries?
Also, is there any limitations for the elastic queries?
I have Azure SQL database database1 in server Server1.database.windows.net I need to retrieve some records from this database, and insert them in a table in different database on different Azure server. Do you think for this scenario it's better to do it using .Net or to use Elastic queries?
Your requirement is query data from one database and then insert returned data into another (different) database, in my opinion, making these operations via code in your program would completely meet your requirement, elastic query is not required for your scenario. Normally the goal of using elastic query is to facilitate querying scenarios where multiple databases contribute rows into a single overall result. You can find detailed information of the elastic query feature in this article.
Besides, sp_execute_remote can help us execute remote stored procedure calls or remote functions, which could be another approach for your scenario.
is there any limitations for the elastic queries?
Under "Preview limitations" section in this article, you can see:
Running your first elastic query can take up to a few minutes on the Standard performance tier. This time is necessary to load the elastic query functionality; loading performance improves with higher performance tiers.
Scripting of external data sources or external tables from SSMS or SSDT is not yet supported.
Import/Export for SQL DB does not yet support external data sources and external tables. If you need to use Import/Export, drop these objects before exporting and then re-create them after importing.
Elastic query currently only supports read-only access to external tables. You can, however, use full T-SQL functionality on the database where the external table is defined. This can be useful to, e.g., persist temporary results using, e.g., SELECT INTO , or to define stored procedures on the elastic query database which refer to external tables.
Except for nvarchar(max), LOB types are not supported in external table definitions. As a workaround, you can create a view on the remote database that casts the LOB type into nvarchar(max), define your external table over the view instead of the base table and then cast it back into the original LOB type in your queries.
Column statistics over external tables are currently not supported. Tables statistics are supported, but need to be created manually.

Building Queries Using Access and Sql Database

I am working in a company where we have hundreds of tables and queries in MS Access and several tables in our SQL Server. Sometimes we need to combine results from both databases.
When we try to do it by linking our SQL tables to MS Access using ODBC connection, the results return very slow that causes our clients to complain. So other than non-ODBC connection property here is what I am looking for;
Being able to build a query using Sql DB and Access DB visually by dragging and dropping tables.
Being able to save these queries and re-run them using C#
Being able to use sub-queries (query in another query)
My question is if there is any kind of application or API (paid or unpaid) that supports all these terms? Or is it possible to build such a thing?
SQL Server Management Studio does all this. But running a query by trying to join a SQL server table with an access table is not a good idea and will always be slow. You need to upload the access table data to a temporary table in SQL Server and join from there.
You can use vba to import the record set to a temp table and then do a join. If you're drawing down both datasets in C# app and joining there then yes that will be relatively more efficient than joining in Access but certainly not the same thing. Joining a large Access table with another large linked table will always be slow.

How to find any missing columns, constraints, indexes on a database as compared to another one

I have an c#.net windows based application that uses a database in Microsoft SQL Server 2008. During deployment for very first time to our client(s), we create a copy of our database and deploy it on client(s) remote server along with the UI application. The client database can be on version SQL Server 2005 and higher.
During times the UI application and associated database has gone lots of changes. Since this is a thick client application the client(s) database is not sync with our latest database and unfortunately no one ever made notes of all the changes done. So my challenges are as follows:
How to find any missing columns on database table in Client's Database as compared to my Database? if any?
How to find any missing Primary/Unique Constraints on database table in Client's Database as compared to my Database? if any?
How to find any missing Indexes on database table that exist in Client's Database as compared to my Database? if any?
Please keep in mind the client(s) database size may ranges from 10-100GB, so i cannot plan to just drop all client tables and recreate it.
You can use Data-tier applications. It's built-in feature of SQL Server, so you don't need to use any extra tools.
You can extract data-tier application from your database (in SSMS right-click -> Tasks -> Extract data-tier application) to a DACPAC file, copy the file to the client's server and use it to upgrade the DB there (or generate update script).
It also integrates nicely with SQL Server Data Tools.
For this task, you need a software that compare SQL database. Just like there is a lot of software to compare text, there is a lot to compare database.
Personally, I use AdoptSQLDiff, but there is a bunch. RedGate has developed one also and I know others exists. Just type SQL Database compare in google to find them. You probably can have the job done with the trial period.
These softwares show you which tables was added, deleted or changed. It does the same for views, indexes, triggers, Stored Procedures, User Defined Functions, Constraints. More importantly, those tools generate script to push modifications into the target database. Very handy, but have a look at the script generated, it sometime messes it up by deleting data, but it can be fixed very easily.
There is also the option to compare data in a specific table if you need to.
Here is a screen shot of the interface of another so you know what it's look like.
With SQLServer Management Studio, you can try selecting a database and then Task->Generate Script, selecting appropriate options.
Do the same thing for the 2 db you want to compare. You will get two text files you can compare with a text file software comparer.
Comparison will highlight difference in the db structure.
Not the best way to do it, of course. But it can be a start. If the two dbs are not too different, you should be able to handle the differences
Better option, use some db comparer software. They are meant to compare db structure, constraint indexes and so on. Never used any of them, so cannot give any advice on that
If it is one time thing use any diff tool for DB, VS2010+ has a build in one, allows you to get difference for schema and data in two different files.
If you want to solve problem of your development process, you have wide range of options to implement versioning for data base.
If you are using EF - use Migrations, can't beat that.
If you are only on SQL Server and never looking at other RDBMS, check DAC ( Data-Tier applications, mentioned by Jakub)
Otherwise take a look at more generic solutions, among them I would reccomend you to take a look at DB.UP and if python code is good for you , check Alembic, it allow you to write your migrations using really nice python API.
if nothing works for you, create snapshot of current db schema and start doing differential scripts that you can use with self written tool or DB.UP
I am not sure if this can help, but who knows.
So is there any way to restore the server database on your local environment? If the answer is yes, you can try to join system views for each database and compare them?
I propose something like this(was a quick solution, so please sorry for formatting and other common stuff).
USE [master]
GO
SELECT
LocalDataBaseTable.name AS TableName,
LocalDataBaseTableColumns.name AS [Column],
LocalDataBaseTypes.name AS DataType,
LocalDataBaseTableColumns.max_length,
LocalDataBaseTableColumns.[precision]
INTO #tmpLocalInfo
FROM LocalTable.sys.columns as LocalDataBaseTableColumns
INNER JOIN LocalTable.sys.tables AS LocalDataBaseTable
ON LocalDataBaseTableColumns.object_id = LocalDataBaseTable.object_id
INNER JOIN LocalTable.sys.types AS LocalDataBaseTypes
ON LocalDataBaseTypes.user_type_id = LocalDataBaseTableColumns.user_type_id
SELECT
ServerDataBaseTable.name AS TableName,
ServerDataBaseTableColumns.name AS [Column],
ServerDataBaseTypes.name AS DataType,
ServerDataBaseTableColumns.max_length,
ServerDataBaseTableColumns.[precision]
INTO #tmpServerInfo
FROM ServerTable.sys.columns as ServerDataBaseTableColumns
INNER JOIN ServerTable.sys.tables AS ServerDataBaseTable
ON ServerDataBaseTableColumns.object_id = ServerDataBaseTable.object_id
INNER JOIN ServerTable.sys.types AS ServerDataBaseTypes
ON ServerDataBaseTypes.user_type_id = ServerDataBaseTableColumns.user_type_id
SELECT
#tmpServerInfo.*
FROM #tmpLocalInfo
RIGHT OUTER JOIN #tmpServerInfo
ON #tmpLocalInfo.TableName = #tmpServerInfo.TableName COLLATE DATABASE_DEFAULT
AND #tmpLocalInfo.[Column] = #tmpServerInfo.[Column] COLLATE DATABASE_DEFAULT
WHERE #tmpLocalInfo.[Column] IS NULL
DROP TABLE #tmpLocalInfo
DROP TABLE #tmpServerInfo
This will return all information about missed columns in your local database. The idea is to investigate 'sys' views and to find out if there any suitable solution for you.
You can use this simple script, which show you differences between tables, views, indexes etc.
Compalex is a free lightweight script to compare two database schemas. It
supports MySQL, MS SQL Server and PostgreSQL.
or look at this question Compare two MySQL databases. This question about comparing two MySQL schemas, but some of listed tools supports MSSQL or have MSSQL version (for example http://www.liquibase.org/).
Another answer What is best tool to compare two SQL Server databases (schema and data)?

automated mdb to sql server

I realize you can use the upsize wizard in access to convert this normally but as this is a server side process where we are getting the mdb files from a third party on a daily basis, I have to be able to ingest these with a no touch architecture.
Currently, I'm about to set out to write it all by hand (ugh) where I read the access database through a datasource and punch it up into sql server through bulk inserts or entity framework. I really wish there were a better way to do this though. I'm willing to entertain lots of creative methods as there are a LOT of tables and a TON of data.
There are a number of methods that come to mind, which do all indeed involve custom programming, but should be relatively simple and straightforward to implement.
From another Access DB, open the source DB programmatically (i.e., with VBA). Create linked tables to SQL backend in source DB. Copy the data from the source DB to linked table (using insert dest select * from source).
Use OPENDATASET or OPENROWSOURCE with SQL Server to directly connect to the Access DB and copy the data. You can use again insert dest select * from source to copy the data, or select * into dest from source to create a new table from the source data. This involves tweaking some system settings on sql server since it's not enabled by default, but a few google searches should get you started.
From a .NET program, use SqlBulkCopy (which is the .NET class for automating bcp) to upload data from the Access database. Just work with the data directly with ADO.Net, as there's no reason to build an entire EF layer just for migrating data from one source to another.
I have used variations of all three methods above in various projects, but for moving a large number of tables, I have found option #2 to be relatively efficient. It will involve some dynamic SQL code if your table names are dynamic on a daily basis, but if they are static, you should only have to write the logic once and use a parameter for the filename to read from.

Cross-referencing across multiple databases

I have two databases, one is an MS Access file, the other is a SQL Server database. I need to create a SELECT command that filters data from the SQL Server database based on the data in the Access database. What is the best way to accomplish this with ADO.NET?
Can I pull the required data from each database into two new tables. Put these in a single Dataset. Then perform another SELECT command on the Dataset to combine the data?
Additional Information:
The Access database is not permanent. The Access file to use is set at runtime by the user.
Here's a bit of background information to explain why there are two databases. My company uses a CAD program to design buildings. The program stores materials used in the CAD model in an Access database. There is one file for each model. I am writing a program that will generate costing information for each model. This is based on current material prices stored in a SQL Server database.
My Solution
I ended up just importing the data in the access db into a temporary table in the SQL server db. Performing all the necessary processing then removing the temporary table. It wasn't a pretty solution but it worked.
You don't want to pull both datasets across if you don't have to do that. You are also going to have trouble implementing Tomalak's solution since the file location may change and might not even be readily available to the server itself.
My guess is that your users set up an Access database with the people/products or whatever that they are interested in working with and that's why you need to select across the two databases. If that's the case, the Access table is probably smaller than the SQL Server table(s). Your best bet is to pull in the Access data, then use that to generate a filtered query to SQL Server so that you can minimize the data that is sent over the network.
So, the most important things are:
Filter the data ON THE SERVER so that you can minimize network traffic and also because the database is going to be faster at filtering than ADO.NET
If you have to choose a dataset to pull into your application, pull in the smaller dataset and then use that to filter the other table.
Assuming Sql Server can get to the Access databases, you could construct an OPENROWSET query across them.
SELECT a.*
FROM SqlTable
JOIN OPENROWSET(
'Microsoft.Jet.OLEDB.4.0',
'C:\Program Files\Microsoft Office\OFFICE11\SAMPLES\Northwind.mdb';'admin';'',
Orders
) as b ON
a.Id = b.Id
You would just change the path to the Access database at runtime to get to different MDBs.
First you need to do something on the server - reference the Access DB as a "Linked Server".
Then you will be able to query it from within the SQL server, pulling out or stuffing in data however you like. This web page gives a nice overview on how to do it.
http://blogs.meetandplay.com/WTilton/archive/2005/04/22/318.aspx
If I read the question correctly, you are NOT attempting to cross reference across multiple databases.
You need merely to reference details about a particular FILE, which in this case, could contain:
primary key, parent file checksum (if it is a modification), file checksum, last known author, revision number, date of last change...
And then that primary key when adding information obtained from analysing that file using your program.
If you actually do need a distributed database, perhaps you would prefer to use a non-relational database such as LDAP.
If you can't use LDAP, but must use a relational database, you might consider using GUID's to ensure that your primary keys are good.
Since you don't give enough information, i'm going to have to make some assumptions.
Assuming:
The SQL Server and the Access Database are not on the same computer
The SQL Server cannot see the Access database over a file share or it would be too difficult to achieve this.
You don't need to do joins between the access database and the sql server, only use data from teh access database as lookup elements of your where clause
If the above assumptions are correct, then you can simply use ADO to open the Access database and retrieve the data you need, possibly in a dataset or datatable. Then extract the data you need and feed it to a different ADO query to your SQL Server in a dynamic Where clause, prepared statement, or via parameters to a stored procedure.
The other solutions people are giving all assume you need to do joins on your data or otherwise execute SQL which includes both databases. To do that, you have to use linked databases, or else import the data into a table (perhaps temporary).
Have you tried benchmarking what happens if you link from the Access front end to your SQL Server via ODBC and write your SQL as though both tables are local? You could then do a trace on the server to see exactly what Jet sends to the server. You might be surprised as to how efficient Jet is with this kind of thing. If you're linking on a key field (e.g., and ID field, whether from the SQL Server or not), it would likely be the case that Jet would send a list of of the IDs. Or you could write your SQL to do it that way (using IN SELECT ... in your WHERE clause).
Basically, how efficient things will be depends on where your WHERE clause is going to be executed. If, for instance, you are joining a local Jet table with a linked SQL Server table on a single field, and filtering the results based on values in the local table, it's very likely to be extremely efficient, in that the only thing Jet will send to the server is whatever is necessary to filter the SQL Server table.
Again, though, it's going to depend entirely on exactly what you're trying to do (i.e., which fields you're filtering on). But give Jet a chance to see if it is smart, as opposed to assuming off the bat that Jet will screw it up. It may very well require some tweaking to get Jet to work efficiently, but if you can keep all your logic client-side, you're better off than trying to muck around with tracking all the Access databases from the server.

Categories

Resources