Effectively join two tables from Oracle and SQL Server in C#

Effectively join two tables from Oracle and SQL Server in C# - c#

So I have two systems I often have to join together to display certain reports. I've got one system that stores metadata on documents that is stored in SQL Server, usually by part number. The list of part numbers I would want to get documents for come from an Oracle table in our ERP system. My current method goes like this:
Get data from ERP (Oracle) system into a DataTable.
Compile string[] of part numbers from a column.
Use an IN() statement to get all document information from docs (MSSQLSVR) system into another DataTable.
Add columns to ERP DataTable, loop through rows.
Fill out document info from docs DataTable, if(erpRow["ITEMNO"] == docRow["ITEMNO"])
This, to me feels really inefficient. Now obviously I can't use one connection string to JOIN the two tables, or use a database link, so I assume there will have to be two calls, one to each database. Is there another way to join these two sets together?

I would suggest a LikedServer approach (http://msdn.microsoft.com/en-us/library/ms188279.aspx). Write a Stored Procedure on the SQL Server side that pulls the data over from an Oracle Linked Server, does the JOIN locally and returns the combined data.
SQL Server has been designed to execute JOINs efficiently. No need to try to recreate that functionality in the app layer.

Since you've ruled out a database link I would do the following
Get data from ERP (Oracle) system into a DataTable.
Pass DataTable as a Parameter to SQL Server via a Table-Valued Parameter
Return your data (no loops updating an older set)

Related

How to produce a filtered .net Dataset (including all JOINed tables) from a larger Dataset?

I have a MySQL database of games for various systems (e.g. Playstation, Atari 2600 etc). It also holds other information such as alternative game names (for each region) and game developer etc.
This is connected via tableadapters via DataSource wizard to a strongly typed dataset dbds. A screenshot of some of the tables is below (taken directly from dataset designer + largely simplified – lots more tables in reality):
(I missed off the gi_system table, but you get the gist - SystemID is on gi_game)
This is a large dataset. I am wanting to produce an xml file that holds the data for all the relevant tables for games for one system alone. Then, I will import this into another app which has a dataset with the same schema. This essentially allows me to fill that dataset one system at a time. Obviously, any process will also need to export the relevant rows from other datatables into the xml files.
To export/import a dataset in its entirety is simple, of course, with .writeXML However, I'm puzzling over how to achieve the above.
I know I could connect the other app directly to the mysql database too, but I’m looking for an offline solution.
I have looked into Linq To Dataset (https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/cross-table-queries-linq-to-dataset) but this only appears to produce a single ‘table’ of results in the query.
The other issue is, if Linq is suitable, how would I write queries given the relational structures demonstrated in the image above. I get into territory around JOIN types, which is a bit baffling. The linq for the above, if it where just game and organisation, is simple:
From g in gi_game
Join o in gi_organisation
On g.DeveloperID Equals o.ID
Select new {GameName = g.name, OrgName = o.name}
However, if I also wanted to pull in the table rows for gi_gamealtname, how would I write that into this query, given the parent-child relation is reversed? Also, this would have to ‘reach’ another step to pull in the relevant gi_region rows.]
The ultimate aim is that in the 'receiving' app, I just do GamesDataSet.ReadXML("export.xml") to load in all the games for the specific system and any associated table rows.
Any help appreciated. Thanks.
EDIT: I've just thought - I don't know if doing the querying on the MySQL database first is a possible solution? That is, perform a SQL query on the data before loading the dataset. Problem with this being twofold: I don't know how I would do this in code from the app-side (I know how to do a custom TableAdapter query, but now for a dataset) and also the problems remain around how to construct an SQL query solving the issues above.

Efficient Update of Table from One SQL Server to Another, Same Table Structure

I have one database server, acting as the main SQL Server, containing a Table to hold all data. Other database servers come in and out (different instances of SQL Server). When they come online, they need to download data from main Table (for a given time period), they then generate their own additional data to the same local SQL Server database table, and then want to update the main server with only new data, using a C# program, through a scheduled service, every so often. Multiple additional servers could be generating data at the same time, although it's not going to be that many.
Main table will always be online. The additional non-main database table is not always online, and should not be an identical copy of main, first it will contain a subset of the main data, then it generates its own additional data to the local table and updates main table every so often with its updates. There could be a decent amount of number of rows generated and/or downloaded. so an efficient algorithm is needed to copy from the extra database to the main table.
What is the most efficient way to transfer this in C#? SqlBulkCopy doesn't look like it will work because I can't have duplicate entries in main server, and it would fail if checking constraints since some entries already exist.

You could do it in DB or in C#. In all cases you must do something like Using FULL JOINs to Compare Datasets. You know that already.
Most important thing is to do it in transaction. If you have 100k rows split it to 1000 rows per transaction. Or try to determine what combination of rows per transaction is best for you.
Use Dapper. It's really fast.
If you have all your data in C#, use TVP to pass it to DB stored procedure. In stored procedure use MERGE to UPDATE/DELETE/INSERT data.
And last. In C# use Dictionary<Tkey, TValue> or something different with O(1) access time.

SQLBulkCopy is the fastest way for inserting data into a table from a C# program. I have used it to copy data between databases and so far nothing beats it speed wise. Here is a nice generic example: Generic bulk copy.
I would use a IsProcessed flag in the table of the main server and keep track of the main table's primary keys when you download data to the local db server. Then you should be able to do a delete and update to the main server again.

Here's how i would do it:
Create a stored procedure on the main table database which receives a user defined table variable with the same structure as the main table.
it should do something like -
INSERT INTO yourtable (SELECT * FROM tablevar)
OR you could use the MERGE statement for the Insert-or-Update functionality.
In code, (a windows service) load all (or a part of) the data from the secondery table and send it to the stored procedure as a table variable.
You could do it in bulks of 1000's and each time a bulk is updated you should mark it in the source table / source updater code.

Can you use linked servers for this? If yes it will make copying of data from and to main server much easier.
When copying data back to the main server I’d use IF EXISTS before each INSERT statement to additionally make sure there are no duplicates and encapsulate all insert statements into transaction so that if an error occurs transaction is rolled back.
I also agree with others on doing this in batches on 1000 or so records so that if something goes wrong you can limit the damage.

Store all SQL Server column values of a record as unique variables in C#

I'm writing a program in C# that will grab data from a staging table, and then insert that same data back into their new respective locations in a SQL Server database. The program will do the following steps sequentially:
Select columns from first row of staging table
Store each column as unique variable
Insert data into new respective locations in the database (each value is going to multiple different tables in the DB, and the values are duplicated between many of the tables)
Move to the next Record
Repeat from step 1 until all records have been processed
So is there a way to iterate through the entire record set, storing each result from a column as a unique variable without having to write separate queries for each value that you want to store? There are 51 columns that all have to go somewhere, and I didn't think it would be very efficient to hardcode 51 variables each with a custom query to the database.
I thought about doing this with a multidimensional array, but then that would just be one string with a ton of values. Any advice would be greatly appreciated.

Although you can do this through a .NET application, really this would be much easier to achieve with a SQL statement. SQL has good syntax for moving data between tables:
INSERT INTO [Destination] ([Columns,])
SELECT [Columns,]
FROM [Source]
If you're moving data between databases, you just need to link one of the databases to the other and then run the query. If you're using SQL Server Management Studio, you can follow this article to set up linked servers. Otherwise, you can use the sp_addlinkedserver procedure to register the linked server.

You can create a class that contains a property for each column in your table and use a micro ORM like Dapper to populate a list of instances of those classes from your database. You can then iterate over the list and do your inserts to other tables.
You could even create other classes for your individual inserts and use AutoMapper to create instances of those from your source class.
But... this might all be overkill for what you are trying to achieve.

How to dynamically map a datatable to a SQL table

Hopefully I can explain what I am trying to do.
I am writing a system to take data stored in Sharepoint lists and push them into SQL tables. This is being done so the data from the lists can be joined with other data and reported on.
I need the system to be quite flexible so I want to store the mapping between the lists and SQL and then create any of the SQL that is missing.
So I would first have to check if the SQL table I want exists and if not create it. Then check all the columns I expect and create an missing ones then populate the table with the list data.
Getting the list data is no problem to me and it isn't a problem for me to store by configuration information.
My issue is I'm not sure what .NET features to use when talking to the database. I was looking into the entity framework and LINQ but these seem to need fixed tables which I don't have.
I am also looking at using the enterprise libraries (4.1) as I use these for event logging.
Ideally what I want to be able to do is build a datatable and then "compare" it to a SQL table and have the system update it as required.
Does any thing like this exist and what approach would you use.

These may help get you started :-
Codeplex - SPListSync
Synchronize information with other
lists or SQL Server table based on a
linked column. This can be helpfull
when having list with companies and
another list with contacts. The
company-information (e.g. Business
phone and address) can be copied to
the linked contacts.
Exporting Data from SharePoint 2007 Lists to SQL Server via SSIS
SO - Easiest way to extract SharePoint list data to a separate SQL Server table?
Commercial
Simego - Data Synchronisation Studio
AxioWorks SQList

You need to bit Study SQL Server Management Objects, through which you can directly interact with SQL Server very easily. Through this you can create New Table, Stored Procedure etc and also check pre-existance of any object.
Talking to Database like this was never so easy...

Cross-referencing across multiple databases

I have two databases, one is an MS Access file, the other is a SQL Server database. I need to create a SELECT command that filters data from the SQL Server database based on the data in the Access database. What is the best way to accomplish this with ADO.NET?
Can I pull the required data from each database into two new tables. Put these in a single Dataset. Then perform another SELECT command on the Dataset to combine the data?
Additional Information:
The Access database is not permanent. The Access file to use is set at runtime by the user.
Here's a bit of background information to explain why there are two databases. My company uses a CAD program to design buildings. The program stores materials used in the CAD model in an Access database. There is one file for each model. I am writing a program that will generate costing information for each model. This is based on current material prices stored in a SQL Server database.
My Solution
I ended up just importing the data in the access db into a temporary table in the SQL server db. Performing all the necessary processing then removing the temporary table. It wasn't a pretty solution but it worked.

You don't want to pull both datasets across if you don't have to do that. You are also going to have trouble implementing Tomalak's solution since the file location may change and might not even be readily available to the server itself.
My guess is that your users set up an Access database with the people/products or whatever that they are interested in working with and that's why you need to select across the two databases. If that's the case, the Access table is probably smaller than the SQL Server table(s). Your best bet is to pull in the Access data, then use that to generate a filtered query to SQL Server so that you can minimize the data that is sent over the network.
So, the most important things are:
Filter the data ON THE SERVER so that you can minimize network traffic and also because the database is going to be faster at filtering than ADO.NET
If you have to choose a dataset to pull into your application, pull in the smaller dataset and then use that to filter the other table.

Assuming Sql Server can get to the Access databases, you could construct an OPENROWSET query across them.
SELECT a.*
FROM SqlTable
JOIN OPENROWSET(
'Microsoft.Jet.OLEDB.4.0',
'C:\Program Files\Microsoft Office\OFFICE11\SAMPLES\Northwind.mdb';'admin';'',
Orders
) as b ON
a.Id = b.Id
You would just change the path to the Access database at runtime to get to different MDBs.

First you need to do something on the server - reference the Access DB as a "Linked Server".
Then you will be able to query it from within the SQL server, pulling out or stuffing in data however you like. This web page gives a nice overview on how to do it.
http://blogs.meetandplay.com/WTilton/archive/2005/04/22/318.aspx

If I read the question correctly, you are NOT attempting to cross reference across multiple databases.
You need merely to reference details about a particular FILE, which in this case, could contain:
primary key, parent file checksum (if it is a modification), file checksum, last known author, revision number, date of last change...
And then that primary key when adding information obtained from analysing that file using your program.
If you actually do need a distributed database, perhaps you would prefer to use a non-relational database such as LDAP.
If you can't use LDAP, but must use a relational database, you might consider using GUID's to ensure that your primary keys are good.

Since you don't give enough information, i'm going to have to make some assumptions.
Assuming:
The SQL Server and the Access Database are not on the same computer
The SQL Server cannot see the Access database over a file share or it would be too difficult to achieve this.
You don't need to do joins between the access database and the sql server, only use data from teh access database as lookup elements of your where clause
If the above assumptions are correct, then you can simply use ADO to open the Access database and retrieve the data you need, possibly in a dataset or datatable. Then extract the data you need and feed it to a different ADO query to your SQL Server in a dynamic Where clause, prepared statement, or via parameters to a stored procedure.
The other solutions people are giving all assume you need to do joins on your data or otherwise execute SQL which includes both databases. To do that, you have to use linked databases, or else import the data into a table (perhaps temporary).

Have you tried benchmarking what happens if you link from the Access front end to your SQL Server via ODBC and write your SQL as though both tables are local? You could then do a trace on the server to see exactly what Jet sends to the server. You might be surprised as to how efficient Jet is with this kind of thing. If you're linking on a key field (e.g., and ID field, whether from the SQL Server or not), it would likely be the case that Jet would send a list of of the IDs. Or you could write your SQL to do it that way (using IN SELECT ... in your WHERE clause).
Basically, how efficient things will be depends on where your WHERE clause is going to be executed. If, for instance, you are joining a local Jet table with a linked SQL Server table on a single field, and filtering the results based on values in the local table, it's very likely to be extremely efficient, in that the only thing Jet will send to the server is whatever is necessary to filter the SQL Server table.
Again, though, it's going to depend entirely on exactly what you're trying to do (i.e., which fields you're filtering on). But give Jet a chance to see if it is smart, as opposed to assuming off the bat that Jet will screw it up. It may very well require some tweaking to get Jet to work efficiently, but if you can keep all your logic client-side, you're better off than trying to muck around with tracking all the Access databases from the server.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.