Most efficient method to load DataSet from subset of multiple joined tables

Most efficient method to load DataSet from subset of multiple joined tables - c#

I have a large inventory system, and I'm having to re-write part of the I/O portion of it. At its heart, there's a product table and a set of related tables. I need to be able to read pieces of it as efficiently as possible. From C# I construct this query:
select * -- includes productid
into #tt
from products where productClass = 547 -- possibly more conditions
select * from #tt;
select * from productHistory where productid in (select productid from #tt);
select * from productSuppliers where productid in (select productid from #tt);
select * from productSafetyInfo where productid in (select productid from #tt);
select * from productMiscInfo where productid in (select productid from #tt);
drop table #tt;
This query gives me exactly the results I need: 5 result sets each having zero, one or more records (if the first returns zero rows, the others do as well, of course). The program then takes those result sets and crams them into an appropriate DataSet. (Which then gets handed off into a constructor expecting just these records.) This query (with differing conditions) gets run a lot.
My question is, is there a more efficient way to retrieve this data?
Re-working this as a single join won't work because each child might return a variable number of rows.

If you have an index on products.productClass this might yield better performance.
select * from products where productClass = 547 -- includes productid
select productHistory.*
from productHistory
join products
on products.productid = productHistory.productid
and products,productClass = 547;
...
If productID is a clustered index then you will probalbly get better permance with
CREATE TABLE #Temp (productid INT PRIMARY KEY CLUSTERED);
insert into #temp
select productid from products where productClass = 547
order by productid;
go
select productHistory.*
from productHistory
join #Temp
on #Temp.productid = productHistory.productid;
A join on a clustered index seems to give the best performance.
Think about it - SQL can match the first and know it can forget about the rest then move to the second knowing it can move foward (not go back to the top).
With a where in (select ..) SQL cannot take advantage of order.
The more tables you need to join the more reason to #temp as you take about 1/2 second hit creating on populating the #temp.
If you are going to #temp you might as well make it a stuctured temp.

Make sure when you JOIN tables you are joining on indexes. Otherwise you will end up with table scans vs index scans and your code will be very slow specially when joining large tables.
Best practice is to optimize your SQL Queries to avoid table scans.

If you don't have it already, I would strongly suggest making this a stored procedure.
Also, I suspect, but can't prove without testing it, that you will get better performance if you perform joins on the products table for each of your subtables rather than copying into a local table.
Finally, unless you can combine the data, I don't think there is a more efficient way to do this.

Without seeing your schema and knowing a little more about your data and table sizes, it's hard to suggest definitive improvements on the query side.
However, instead of "cramming the results into an appropriate DataSet," since you are using a batched command to return multiple result sets, you could use SqlDataAdapter to do that part for you:
SqlDataAdapter adapter = new SqlDataAdapter(cmd);
DataSet results = new DataSet();
adapter.Fill(results);
After that, the first result set will be in results.Tables[0], the second in results.Tables[1], etc.

Related

Calling SQL select statement from C# thousands of times and is very time consuming. Is there a better way?

I get a list of ID's and amounts from a excel file (thousands of id's and corresponding amounts). I then need to check the database to see if each ID exists and if it does check to make sure the amount in the DB is greater or equal to that of the amount from the excel file.
Problem is running this select statement upwards of 6000 times and return the values I need takes a long time. Even at a 1/2 of a second a piece it will take about an hour to do all the selects. (I normally dont get more than 5 results max back)
Is there a faster way to do this?
Is it possible to somehow pass all the ID's at once and just make 1 call and get the massive collection?
I have tried using SqlDataReaders and SqlDataAdapters but they seem to be about the same (too long either way)
General idea of how this works below
for (int i = 0; i < ID.Count; i++)
{
SqlCommand cmd = new SqlCommand("select Amount, Client, Pallet from table where ID = #ID and Amount > 0;", sqlCon);
cmd.Parameters.Add("#ID", SqlDbType.VarChar).Value = ID[i];
SqlDataAdapter da = new SqlDataAdapter(cmd);
da.Fill(dataTable);
da.Dispose();
}

Instead of a long in list (difficult to parameterise and has a number of other inefficiencies regarding execution plans: compilation time, plan reuse, and the plans themselves) you can pass all the values in at once via a table valued parameter.
See arrays and lists in SQL Server for more details.
Generally I make sure to give the table type a primary key and use option (recompile) to get the most appropriate execution plans.

Combine all the IDs together into a single large IN clause, so it reads like:
select Amount, Client, Pallet from table where ID in (1,3,5,7,9,11) and Amount > 0;

"I have tried using SqlDataReaders and SqlDataAdapters"
It sounds like you might be open to other APIs. Using Linq2SQL or Linq2Entities:
var someListIds = new List<int> { 1,5,6,7 }; //imagine you load this from where ever
db.MyTable.Where( mt => someListIds.Contains(mt.ID) );
This is safe in terms of avoiding potential SQL injection vulnerabilities and will generate a "in" clause. Note however the size of the someListIds can be so large that the SQL query generated exceeds limits of query length, but the same is true of any other technique involving the IN clause. You can easily workaround that by partitioning lists into large chunks, and still be tremendously better than a query per ID.

Use Table-Valued Parameters
With them you can pass a c# datatable with your values into a stored procedure as a resultset/table which you can join to and do a simple:
SELECT *
FROM YourTable
WHERE NOT EXISTS (SELECT * FORM InputResultSet WHERE YourConditions)

Use the in operator. Your problem is very common and it has a name: N+1 performance problem
Where are you getting the IDs from? If it is from another query, then consider grouping them into one.

Rather than performing a separate query for every single ID that you have, execute one query to get the amount of every single ID that you want to check (or if you have too many IDs to put in one query, then batch them into batches of a few thousand).

Import the data directly to SQL Server. Use stored procedure to output the data you need.
If you must consume it in the app tier... use xml datatype to pass into a stored procedure.

You can import the data from the excel file into SQL server as a table (using the import data wizard). Then you can perform a single query in SQL server where you join this table to your lookup table, joining on the ID field. There's a few more steps to this process, but it's a lot neater than trying to concatenate all the IDs into a much longer query.
I'm assuming a certain amount of access privileges to the server here, but this is what I'd do given the access I normally have. I'm also assuming this is a one off task. If not, the import of the data to SQL server can be done programmatically as well

IN clause has limits, so if you go with that approach, make sure a batch size is used to process X amount of Ids at a time, otherwise you will hit another issue.
A #Robertharvey has noted, if there are not a lot of IDs and there are no transactions occurring, then just pull all the Ids at once into memory into a dictionary like object and process them there. Six thousand values is not alot and a single select could return all those back within a few seconds.
Just remember that if another process is updating the data, your local cached version may be stale.

There is another way to handle this, Making XML of IDs and pass it to procedure. Here is code for procedure.
IF OBJECT_ID('GetDataFromDatabase') IS NOT NULL
BEGIN
DROP PROCEDURE GetDataFromDatabase
END
GO
--Definition
CREATE PROCEDURE GetDataFromDatabase
#xmlData XML
AS
BEGIN
DECLARE #DocHandle INT
DECLARE #idList Table (id INT)
EXEC SP_XML_PREPAREDOCUMENT #DocHandle OUTPUT, #xmlData;
INSERT INTO #idList (id) SELECT x.id FROM OPENXML(#DocHandle, '//data', 2) WITH ([id] INT) x
EXEC SP_XML_removeDOCUMENT #DocHandle ;
--SELECT * FROM #idList
SELECT t.Amount, t.Client, t.Pallet FROM yourTable t INNER JOIN #idList x ON t.id = x.id and t.Amount > 0;
END
GO
--Uses
EXEC GetDataFromDatabase #xmlData = '<root><data><id>1</id></data><data><id>2</id></data></root>'
You can put any logic in procedure. You can pass id, amount also via XML. You can pass huge list of ids via XML.

SqlDataAdapter objects too heavy for that.
Firstly, using stored procedures, it will be faster.
Secondly, use the group operation, for this pass as a parameter to a list of identifiers on the side of the database, run a query on these parameters, and return the processed result.
It will quickly and efficiently, as all data processing logic is on the side of the database server

You can select the whole resultset (or join multiple 'limited' result sets) and save it all to DataTable Then you can do selects and updates (if needed) directly on datatable. Then plug new data back... Not super efficient memory wise, but often is very good (and only) solution when working in bulk and need it to be very fast.
So if you have thousands of records, it might take couple of minutes to populate all records into the DataTable
then you can search your table like this:
string findMatch = "id = value";
DataRow[] rowsFound = dataTable.Select(findMatch);
Then just loop foreach (DataRow dr in rowsFound)

Returning a list from an SqlCommand object - how can I give the database as little work as possible?

I am writing a private method for my class. I am passing as a parameter to this a list of integers, representing the ID of a row in my SQL Server 2008 table.
I wish to return a List<string> of the "Name" (a column) on all rows where one of the passed in integers is equal to an "ID". So if I pass in the List<int> {1, 2, 3 }.
I want to essentially run the commmand (SELECT Name FROM Table WHERE ID = 1 OR ID = 2 OR ID = 3).ToList<string>().
The database I am using is very busy, and thus it is very important that I optimise my solution as much as possible. With this in mind, I am wondering if it would be better practice for me to create a link to this DB using a .dbml file and use Linq to SQL to query the database?
Or simply to create an SQLCommand object, execute it once, iterate over a reader and save it in a List? What is the most optimal way to do this ? Is creating a .dbml file to represent a very busy database bad practice ?

Creating a .dbml has very little do to with server-side performance; that changes the tooling at the calling end - but the server won't really notice the difference between commands coming from .dbml vs hand-coded, at least: not for things this simple (I should note that for complex queries, hand-coded queries can often out-perform machine-generated queries).
In terms of performance at the caller; a .dbml is just a wrapper around all the usual command/reader/etc - it can't make things faster. In some cases it might make it slower, if it doesn't do a good job of parsing an expression, or doesn't cache the parsed outcome (in terms of the TSQL).
What I will say, though, is that dapper will handle this very nicely for you:
var ids = new List<int>{1,2,3};
var names = conn.Select<string>("select Name from Table where ID in #ids",
new {ids}).ToList();
dapper will spot the in #ids usage, and will expand that as parameters, executing:
select Name from Table where ID in (#p__0, #p__1, #p__2)
(or something like that) - passing 1, 2 and 3 as those values.
That gives you:
convenience at the caller
performance at the caller (dapper is heavily optimized)
full parameterization
allowing for optimal query-plan re-use at the server
More generally, dapper will also happily handle general entity mapping, for example:
int id = 12345;
var customer = conn.Select<Customer>("select * from Custom where Id = #id",
new { id }).Single();

Several things I would do:
A. Use a table valued parameter
CREATE TYPE LocationTableType AS TABLE
( ID INT);
GO
B. Use a stored procedure (with your TVP)
CREATE PROCEDURE dbo. usp_GetLocationNames
#TVP LocationTableType READONLY
AS
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT Name
FROM dbo.Location l
JOIN #TVP t ON l.ID = t.ID
C. Allow dirty reads - SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
D. Don't count rows - SET NOCOUNT ON;
E. Cache the resultset for a period of time
Since I know very little about your application or your situation, these items are 'generally' what I would do with most procs. Obviously, if you were checking someone's bank account balance before dispensing cash you would not allow dirty reads, nor cache the resultset. But in most situations, these things are acceptable.

In order to limit the traffic is better to reduce the number of roundtrip to the database, so is betetr if you issue just one command, maybe using the IN clause instead of multiple or and parametrize your query.

If you select by numeric IDs then it is safe to form WHERE clause dynamically (i.e. WHERE ID IN (1,2,3,...)
More advanced way is to create SP with XML parameter. Sample code snippet:
DECLARE #xmlIds AS XML
SET #xmlIds = '<Ids><ID>1</ID><ID>2</ID></Ids>'
SELECT Name FROM Table
WHERE ID IN (
SELECT
Data.row.value('.', 'INT')
FROM
#XmlIds.nodes('/Ids/ID') As Data(row))

You can do all of this natively using native C# and Sql 2008.
In Sql 2008 it introduced User-Defined Table Types, and Table-Valued Parameters for Stored Procedures.
So the following would give you exactly what you want,
CREATE TYPE UdtId AS TABLE
(
[ID] INT NOT NULL
PRIMARY KEY NONCLUSTERED ([ID] ASC)
)
CREATE PROCEDURE spGetCustomerByIds
#IDS UdtId READONLY
AS
BEGIN
SELECT
C.*
FROM Customer C
INNER JOIN UdtId I ON
C.ID = I.ID
END
Hopefully the C# code behind it will be obvious, but it would look like,
public foo GetCustomerDataByIds(IEnumerable<int> ids)
{
using (var command = new SqlCommand())
using (var adatper = new SqlAdapter())
using (var dataSet = new DataSet())
{
command.Text = "spGetCustomerByIds";
command.CommandType = CommandType.StoredProcedure;
command.Parameters.AddWithValue("#IDS", GetDataTableOfIds(ids));
// execute and return the stuff you're after
}
}
private DataTable GetDataTableOfIds(IEnumerable<int> ids)
{
var table = new DataTable();
table.Columns.Add(new DataColumns("ID", typeof(int));
foreach (var id in ids)
{
table.Rows.Add(id);
}
return table;
}

Joining Lists using Linq returns different result than corresponding SQL query?

I have 2 tables
TableA:
TableAID int,
Col1 varchar(8)
TableB:
TableBID int
Col1 char(8),
Col2 varchar(40)
When I run a SQL query on the 2 tables it returns the following number of rows
SELECT * FROM tableA (7200 rows)
select * FROM tableB (28030 rows)
When joined on col1 and selects the data it returns the following number of rows
select DISTINCT a.Col1,b.Col2 FROM tableA a
join tableB b on a.Col1=b.Col1 (6578 rows)
The above 2 tables on different databases so I created 2 EF models and retried the data separately and tried to join them in the code using linq with the following function. Surprisingly it returns 2886 records instead of 6578 records. Am I doing something wrong?
The individual lists seems to return the correct data but when I join them SQL query and linq query differs in the number of records.
Any help on this greatly appreciated.
// This function is returning 2886 records
public List<tableC_POCO_Object> Get_TableC()
{
IEnumerable<tableC_POCO_Object> result = null;
List<TableA> tableA_POCO_Object = Get_TableA(); // Returns 7200 records
List<TableB> tableB_POCO_Object = Get_TableB(); // Returns 28030 records
result = from tbla in tableA_POCO_Object
join tblb in tableB_POCO_Object on tbla.Col1 equals tblb.Col1
select new tableC_POCO_Object
{
Col1 = tblb.Col1,
Col2 = tbla.Col2
};
return result.Distinct().ToList();
}

The problem lies in the fact that in your POCO world, you're trying to compare two strings using a straight comparison (meaning it's case-sensitive). That might work in the SQL world (unless of course you've enabled case-sensitivity), but doesn't quite work so well when you have "stringA" == "StringA". What you should do is normalize the join columns to be all upper or lower case:
join tblb in tableB_POCO_Object on tbla.Col1.ToUpper() equals tblb.Col1.ToUpper()
Join operator creates a lookup using the specified keys (starts with second collection) and joins the original table/collection back by checking the generated lookup, so if the hashes ever differ they will not join.
Point being, joining OBJECT collections on string data/properties is bad unless you normalize to the same cAsE. For LINQ to some DB provider, if the database is case-insensitive, then this won't matter, but it always matters in the CLR/L2O world.
Edit: Ahh, didn't realize it was CHAR(8) instead of VARCHAR(8), meaning it pads to 8 characters no matter what. In that case, tblb.Col1.Trim() will fix your issue. However, still keep this in mind when dealing with LINQ to Objects queries.

This might happen because you compare a VARCHAR and a CHAR column. In SQL, this depends on the settings of ANSI_PADDING on the sql server, while in C# the string values are read using the DataReader and compared using standard string functions.
Try tblb.Col1.Trim() in your LINQ statement.

As SPFiredrake correctly pointed out this can be caused by case sensitivity, but I also have to ask you why did you write your code in such a way, why not this way:
// This function is returning 2886 records
public List<tableC_POCO_Object> Get_TableC()
{
return from tbla in Get_TableA()
join tblb in Get_TableB() on tbla.Col1 equals tblb.Col1
select new tableC_POCO_Object
{
Col1 = tblb.Col1,
Col2 = tbla.Col2
}.Distinct().ToList();
}
where Get_TableA() and Get_TableB() return IEnumerable instead of List. You have to watch out for that, because when you convert to list the query will be executed instantly. You want to send a single query to the database server.

Enhance performance of large slow dataloading query

I'm trying to load data from oracle to sql server (Sorry for not writing this before)
I have a table(actually a view which has data from different tables) with 1 million records atleast. I designed my package in such a way that i have functions for business logics and call them in select query directly.
Ex:
X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)
Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id)
FROM Table1
Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns.
And some functions compare the parameters passed with audit table and perform logic
How can i improve performance of my query or is there a better way to do this
I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception.
my function does selects and then performs logic for example:
Function(c_x2, eid)
Select col1
into p_x1
from tableP
where eid = eid;
IF (p_x1 = NULL) THEN
ret_var := 'INITIAL';
ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN
ret_var:= 'RL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'RL', eid, 'PackageProcName');
ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN
ret_var := 'GL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'GL', eid, 'PackgProcName');
END IF;
RETURN ret_var;

i'm getting each row and performing
logic in C# and then inserting
If possible INSERT from the SELECT:
INSERT INTO YourNewTable
(col1, col2, col3)
SELECT
col1, col2, col3
FROM YourOldTable
WHERE ....
this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row.
EDIT as for the OP question edit:
you should be able to replace the function call to plain SQL in your query. Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.
EDIT based on OP recent comments:
since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. and all the other details. There are much better ways of bulk loading data into SQL Server. As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.

My recommendation is that you do not use functions and then call them within other SELECT statements. This:
SELECT t.id, ...
x1(t.id) ...
FROM TABLE t
...is equivalent to:
SELECT t.id, ...
(SELECT x.column FROM x1 x WHERE x.id = t.id)
FROM TABLE t
Encapsulation doesn't work in SQL like when using C#/etc. While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.
A better approach would be to update the supporting function to include the join criteria (IE: "where x.id = t.id" for lack of real one) in the SELECT:
SELECT x.id
x.column
FROM x1 x
...so you can use it as a JOIN:
SELECT t.id, ...
x1.column
FROM TABLE t
JOIN (SELECT x.id,
x.column
FROM MY_PACKAGE.x) x1 ON x1.id = t.id
I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped.

Personally I'd create an SSIS import to do this task. USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.

Firstly you need to find where the performance problem actually is. Then you can look at trying to solve it.
What is the performance of the view like? How long does it take the view to execute
without any of the function calls? Try running the command
How well does it perform? Does it take 1 minute or 1 hour?
create table the_view_table
as
select *
from the_view;
How well do the functions perform? According to the description you are making approximately 5 million function calls. They had better be pretty efficient! Also are the functions defined as deterministic. If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls.
Is there a way of reducing the number of function calls? The function are being called once the view has been evaluated and the million rows of data are available. BUT are all the input values from the highest level of the query? Can the function calls be imbeded into the view at a lower level. Consider the following two queries. Which would be quicker?
select
f.dim_id,
d.dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from large_fact_table f
join small_dim_table d on (f.dim_id = d.dim_id)
select
f.dim_id,
d.dim_col_1,
d.dim_col_2
from large_fact_table f
join (
select
dim_id,
dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from small_dim_table) d on (f.dim_id = d.dim_id)
Ideally the second query should run quicker as it calling the function fewer times.
The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts.

A couple of tips:
Don't load all records into RAM but process them one by one.
Try to run as many functions on the client as possible. Databases are really slow to execute user defined functions.
If you need to join two tables, it's sometimes possible to create two connections on the client. Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them.
If your functions always return the same result for the same input, use a computed column or a materialized view. The database will run the function once and save it in a table somewhere. That will make INSERT slow but SELECT quick.

Create a sorted intex on your table.
Introduction to SQL Server Indizes, other RDBMS are similar.
Edit since you edited your question:
Using a view is even more sub-optimal, especially when querying single rows from it. I think your "busines functions" are actually something like stored procedures?
As others suggested, in SQL always go set based. I assumed you already did that, hence my tip to start using indexing.

What is the best way, algorithm, method to difference large lists of data?

I am receiving a large list of current account numbers daily, and storing them in a database. My task is to find added and released accounts from each file. Right now, I have 4 SQL tables, (AccountsCurrent, AccountsNew, AccountsAdded, AccountsRemoved). When I receive a file, I am adding it entirely to AccountsNew. Then running the below queries to find which we added and removed.
INSERT AccountsAdded(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew WHERE AccountNumber not in (SELECT AccountNum FROM AccountsCurrent)
INSERT AccountsRemoved(AccountNum, Name) SELECT AccountNum, Name FROM AccountsCurrent WHERE AccountNumber not in (SELECT AccountNum FROM AccountsNew)
TRUNCATE TABLE AccountsCurrent
INSERT AccountsCurrent(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew
TRUNCATE TABLE AccountsNew
Right now, I am differencing about 250,000 accounts, but this is going to keep growing. Is this the best method, do you have any other ideas?
EDIT:
This is an MSSQL 2000 database. I'm using c# to process the file.
The only data I am focused on is the accounts that were added and removed between the last and current files. The AccountsCurrent, is only used to determine what accounts were added or removed.

To be honest, I think that I'd follow something like your approach. One thing is that you could remove the truncate, do a rename of the "new" to "current" and re-create "new".

Sounds like a history/audit process that might be better done using triggers. Have a separate history table that captures changes (e.g., timestamp, operation, who performed the change, etc.)
New and deleted accounts are easy to understand. "Current" accounts implies that there's an intermediate state between being new and deleted. I don't see any difference between "new" and "added".
I wouldn't have four tables. I'd have a STATUS table that would have the different possible states, and ACCOUNTS or the HISTORY table would have a foreign key to it.

Using IN clauses on long lists can be slow.
If the tables are indexed, using a LEFT JOIN can prove to be faster...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
LEFT JOIN
[table2]
ON [join condition]
WHERE
[table2].[id] IS NULL
This assumes 1:1 relationships and not 1:many. If you have 1:many you can do any of...
1. SELECT DISTINCT
2. Use a GROUP BY clause
3. Use a different query, see below...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
WHERE
EXISTS (SELECT * FROM [table2] WHERE [condition to match tables 1 and 2])
-- # This is quick provided that all fields to match the two tables are
-- # indexed in both tables. Should then be much faster than the IN clause.

You could also subtract the intersection to get the differences in one table.

If the initial file is ordered in a sensible and consistent way (big IF!), it would run considerably faster as a C# program which logically compared the files.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Most efficient method to load DataSet from subset of multiple joined tables - c#

Make sure when you JOIN tables you are joining on indexes. Otherwise you will end up with table scans vs index scans and your code will be very slow specially when joining large tables. Best practice is to optimize your SQL Queries to avoid table scans.

Related

Calling SQL select statement from C# thousands of times and is very time consuming. Is there a better way?

Returning a list from an SqlCommand object - how can I give the database as little work as possible?

Joining Lists using Linq returns different result than corresponding SQL query?

Enhance performance of large slow dataloading query

What is the best way, algorithm, method to difference large lists of data?

Categories

Resources