Npgsql C# multi row/value insert - c#

I've been looking at the PostGres multi row/value insert which looks something like this in pure SQL:
insert into table (col1, col2, col3) values (1,2,3), (4,5,6)....
The reason I wan to use this is I have a lot of data to insert that is arriving via a queue, which I'm batching into 500/1000 record inserts at a time to improve performance.
However, I have been unable to find an example of doing this from within C#, everything I can find is adding only a single records parameter at a time, then executing, which is too slow.
I have this working using Dapper currently, but I need to expand the SQL to an upsert (insert on conflict update) which everything I have found indicated Dapper can't handle. I have found evidence the Postgres can handle upsert and multi valued in a single action.
Tom

I didn't get your question completely right. But for bulk insert in Postgresql, this is a good answer
It gives an example for inserting multiple records from a list (RecordList) into table (user_data.part_list) :
using (var writer = conn.BeginBinaryImport(
"copy user_data.part_list from STDIN (FORMAT BINARY)"))
{
foreach (var record in RecordList)
{
writer.StartRow();
writer.Write(record.UserId);
writer.Write(record.Age, NpgsqlTypes.NpgsqlDbType.Integer);
writer.Write(record.HireDate, NpgsqlTypes.NpgsqlDbType.Date);
}
writer.Complete();
}

COPY is the fastest way but does not work if you want to do UPSERTS with an ON CONFLICT ... clause.
If it's necessary to use INSERT, ingesting n rows (with possibly varying n per invocation) can be elegantly done using UNNEST like
INSERT INTO table (col1, col2, ..., coln) SELECT UNNEST(#p1), UNNEST(#p2), ... UNNEST(#pn);
The parameters p then need to be an array of the matching type. Here's an example for an array of ints:
new NpgsqlParameter()
{
ParameterName = "p1",
Value = new int[]{1,2,3},
NpgsqlDbType = NpgsqlDbType.Array | NpgsqlDbType.Integer
}

If you want to insert many records efficiently, you probably want to take a look at Npgsql's bulk copy API, which doesn't use SQL and is the most efficient option available.
Otherwise, there's nothing special about inserting two rows rather than one:
insert into table (col1, col2, col3) values (#p1_1,#p1_2,#p1_3), (#p2_1,#p2_2,#p_3)....
Simply add the parameters with the correct name and execute just as you would any other SQL.

Related

Improve SQLite writing speed C#

I need to dramatically improve writing speed for SQLite (or maybe suggest another solution for this outside of SQLite).
Scenario :
I have 71 Columns with 365 * 24 * 60 values each. (365 = days)
I do "insert intos" for testing the db_performance
To shorten the testing-time I did the tests for 90 days instead of 365 (so the result-timespans will be x4)
Settings :
I've tried various PRAGMAS like
synchronous off
locking_mode exclusive
cache & pagesize with different values (though I read low values may improve performance, for me higher values did a good job)
journal_mode off
changing timeout values
Approaches :
#A1 Gathering all "insert intos", ExecuteNonquery each, at the end do one giant transaction
#A2 the same like above but with ParallelForEach and ExecuteNonqueryAsync
#A3 Gathering all "insert intos" for one day and do one transaction each
Tablestructure :
#T1 One table with all the columns
#T2 One table for each column
Results :
I did runs for 90 days ( so it doesn't take too long ) and the main problem is writing speed.
I measured 5 phases, which are :
#P1 setup the tables & headers ( ~ 8-9ms)
#P2 prepare the data (for every "insert into" command do ExecuteNonquery) ( ~ 15000-18000ms ! )
#P3 do the transaction (~ 200-500 ms)
#P4 read one complete column ( ~80 - 200 ms)
#P5 delete one complete column ( ~ 1 - 9 ms)
I tried all the different methods and approaches I mentioned before, but couldn't manage to improve #P2. Any ideas how to fix that ? Or maybe any hint for a better solution as a serverless db (Realm?) ?
Here's the code for the #A1 #P2 #T2, which had the best results so far...
using (var transaction = sqLiteConnection.BeginTransaction())
{
using (var command = sqLiteConnection.CreateCommand())
{
foreach (var vcommand in values_list)
{
command.CommandText = vcommand;
command.ExecuteNonQuery();
}
}
transaction.Commit();
}
(values list is a string[] with 71*90 insert intos or in Marks version one giant command.)
Edit/Update :
I tried the approach by Mark Benningfield making one giant "insert into" for all values in one table with all columns and could improve the overall speed to ~8500ms (#P2 ~7500ms).
Final Update :
Ok I did a bunch of tests and will summarize the results :
For comparison reason all databases had the same values, a two-dimensional double array with [129600,71] values. None of them had a prepared insert-statement, so the generation time for transforming the values into the needed format is included (phase 2).
SQLite needs ~14seconds with one giant transaction (the previous ~8s were without generating the insert-into-command live). SQL_CE is atm the best for this scenario. This is mainly due to not operating with strings ("INSERT INTO"), but with DataTables and rows + bulkInsert. Realm is interesting, especially for mobile users - very intuitive. But you cannot add dynamic obejcts atm (so you need a static object). Influx is another nice database for timeseries, but it's very specific, not embedded and has IMHO a poor C# implementation (it may perform much better via console).
Have you tried writing the data to a text file and then using the import command (see Importing CSV files)? Unlike INSERT commands, these routines usually ignore triggers and work with direct table access.
Make your insert command look like this (by constructing it however you need to):
INSERT INTO table (col1, col2, col3) VALUES (val1, 'val2', val3),
(val1, 'val2', val3),
(val2, 'val2', val3),
...
(val1, 'val2', val3);
Then execute the single insert command to do a bulk update of known data.
Ok I did a bunch of tests and will summarize the results :
For comparison reason all databases had the same values, a two-dimensional double array with [129600,71] values. None of them had a prepared insert-statement, so the generation time for transforming the values into the needed format is included (phase 2).
SQLite needs ~14seconds with one giant transaction (the previous ~8s were without generating the insert-into-command live). SQL_CE is atm the best for this scenario. This is mainly due to not operating with strings ("INSERT INTO"), but with DataTables and rows + bulkInsert.
Realm is interesting, especially for mobile users - very intuitive. But you cannot add dynamic obejcts atm (so you need a static object).
Influx is another nice database for timeseries, but it's very specific, not embedded and has IMHO a poor C# implementation (it may perform much better via console).
The fastest and recommended way to do bulk inserts is to use prepared statements with parameters. That way, a statement (command) is only parsed and prepared once, instead of having to parse it again for every row. SQLite also does not have to parse the parameter values from the command text, but they are supplied and used directly. For each row, you only switch parameters.
So instead of going this way:
using (var transaction = sqLiteConnection.BeginTransaction())
{
using (var command = sqLiteConnection.CreateCommand())
{
foreach (var vcommand in values_list)
{
command.CommandText = vcommand;
command.ExecuteNonQuery();
}
}
transaction.Commit();
}
You should do it like this:
using (var transaction = sqLiteConnection.BeginTransaction())
{
using (var command = sqLiteConnection.CreateCommand())
{
// Create command and parameters
command.CommandText = "INSERT INTO MyTable VALUES (?, ?)";
var param1 = command.Parameters.Add(null, SqliteType.Integer);
var param2 = command.Parameters.Add(null, SqliteType.Text);
foreach (var item in values_list)
{
// For each row, only update parameter values
param1.Value = item.IntProperty;
param2.Value = item.TextProperty;
command.ExecuteNonQuery();
}
}
transaction.Commit();
}
This will perform much better. The statement is only parsed on first execute. All following executes will use the already prepared statement. It also safeguards you against SQL Injection attacks: Text parameters are not inserted into the actual SQL statement string, which would allow manipulation of your statement. Instead, they are passed directly as values to SQLite. So you do not only gain performance, you have also prevented one of the most common database attack scenarios.
General rule: Never put values directly into SQL statements. Always use parameters instead.
Note: There are also other ways to create and parameters. This is just one example how to do it. For example, you can also use named parameters:
// Create command and parameters
command.CommandText = "INSERT INTO MyTable VALUES (#one, #two)";
var param1 = command.Parameters.Add("#one", SqliteType.Integer);
var param2 = command.Parameters.Add("#two", SqliteType.Text);

MySQL Bulk Insert for relational table from MS.NET

I want to perform bulk insert from CSV to MySQL database using C#, I'm using MySql.Data.MySqlClient for connection. CSV columns are refereed into multiple tables and they are dependent on primary key value, for example,
CSV(column & value): -
emp_name, address,country
-------------------------------
jhon,new york,usa
amanda,san diago,usa
Brad,london,uk
DB Schema(CountryTbl) & value
country_Id,Country_Name
1,usa
2,UK
3,Germany
DB Schema(EmployeeTbl)
Emp_Id(AutoIncrement),Emp_Name
DB Schema(AddressTbl)
Address_Id(AutoIncrement), Emp_Id,Address,countryid
Problem statement:
1> Read data from CSV to get the CountryId from "CountryTbl" for respective employee.
2> Insert data into EmployeeTbl and AddressTbl with CountryId
Approach 1
Go as per above problem statement steps, but that will be a performance hit (Row-by-Row read and insert)
Approach 2
Use "Bulk Insert" option "MySqlBulkLoader", but that needs csv files to read, and looks that this option is not going to work for me.
Approach 3
Use stored proc and use the procedure for upload. But I don't want to use stored proc.
Please suggest if there is any other option by which I can do bulk upload or suggest any other approach.
Unless you have hundreds of thousands of rows to upload, bulk loading (your approach 2) probably is not worth the extra programming and debugging time it will cost. That's my opinion, for what it's worth (2x what you paid for it :)
Approaches 1 and 3 are more or less the same. The difference lies in whether you issue the queries from c# or from your sp. You still have to work out the queries. So let's deal with 1.
The solutions to these sorts of problems depend on make and model of RDBMS. If you decide you want to migrate to SQL Server, you'll have to change this stuff.
Here's what you do. For each row of your employee csv ...
... Put a row into the employee tbl
INSERT INTO EmployeeTbl (Emp_Name) VALUES (#emp_name);
Notice this query uses the INSERT ... VALUES form of the insert query. When this query (or any insert query) runs, it drops the autoincremented Emp_Id value where a subsequent invocation of LAST_INSERT_ID() can get it.
... Put a row into the address table
INSERT INTO AddressTbl (Emp_Id,Address,countryid)
SELECT LAST_INSERT_ID() AS Emp_Id,
#address AS Address,
country_id AS countryid
FROM CountryTbl
WHERE Country_Name = #country;
Notice this second INSERT uses the INSERT ... SELECT form of the insert query. The SELECT part of all this generates one row of data with the column values to insert.
It uses LAST_INSERT_ID() to get Emp_Id,
it uses a constant provided by your C# program for the #address, and
it looks up the countryid value from your pre-existing CountryTbl.
Notice, of course, that you must use the C# Parameters.AddWithValue() method to set the values of the # parameters in these queries. Those values come from your CSV file.
Finally, wrap each thousand rows or so of your csv in a transaction, by preceding their INSERT statements with a START TRANSACTION; statement and ending them with a COMMIT; statement. That will get you a performance improvement, and if something goes wrong the entire transaction will get rolled back so you can start over.

SQL Server insert from flat file

I have some data that needs to be imported into SQL Server.
I have the following fields:
ID Param1 Param2
The way it needs to go into the table is not that straighforward.
It needs to go in as
ID Param1 5655 DateTime
ID Param2 5555 DateTime
as such, it needs to insert 2 records into the table for one row from the input file. Wondering what the best way to do this in SQL Server is in terms of importing the file. I can do a BULK INSERT but I the columns need to match exactly. In my case it does not
I am also using .NET C#. Wondering if importing file to datatable, etc. and then using foreach look to further manipulate it may be the best approach.
As the question was a little bit unclear for me but if I'm getting you well then there is many ways for doing it one simple way is using a temp table:
create a temp table:
CREATE TABLE #TBL (ID int, param1 datetime, param2 datetime);
bulk insert from file into temp table
BULK INSERT #TBL FROM 'D:\data.txt' WITH (FIELDTERMINATOR = ' ');
now you can insert into permanent table using a specific query on the temp table (assuming your table structure is: (ID,param) ):
INSERT INTO TABLE_NAME(id,PARAM)
SELECT DISTINCT T.ID,T.PARAM1
FROM #TBL
UNION
SELECT DISTINCT T.ID,T.PARAM2
FROM #TBL
Since you are using C#, you can make use of Table-Valued Parameters to stream in the data in any way you like. You can read a row from a file, split it apart, and pass in 2 rows instead of mapping columns 1 to 1. I detailed a similar approach in this answer:
How can I insert 10 million records in the shortest time possible?
The main difference here is that, in the while loop inside of the GetFileContents() method, you would need to call yield return twice, once for each piece.

how to improve SQL query performance in my case

I have a table, schema is very simple, an ID column as unique primary key (uniqueidentifier type) and some other nvarchar columns. My current goal is, for 5000 inputs, I need to calculate what ones are already contained in the table and what are not. Tht inputs are string and I have a C# function which converts string into uniqueidentifier (GUID). My logic is, if there is an existing ID, then I treat the string as already contained in the table.
My question is, if I need to find out what ones from the 5000 input strings are already contained in DB, and what are not, what is the most efficient way?
BTW: My current implementation is, convert string to GUID using C# code, then invoke/implement a store procedure which query whether an ID exists in database and returns back to C# code.
My working environment: VSTS 2008 + SQL Server 2008 + C# 3.5.
My first instinct would be to pump your 5000 inputs into a single-column temporary table X, possibly index it, and then use:
SELECT X.thecol
FROM X
JOIN ExistingTable USING (thecol)
to get the ones that are present, and (if both sets are needed)
SELECT X.thecol
FROM X
LEFT JOIN ExistingTable USING (thecol)
WHERE ExistingTable.thecol IS NULL
to get the ones that are absent. Worth benchmarking, at least.
Edit: as requested, here are some good docs & tutorials on temp tables in SQL Server. Bill Graziano has a simple intro covering temp tables, table variables, and global temp tables. Randy Dyess and SQL Master discuss performance issue for and against them (but remember that if you're getting performance problems you do want to benchmark alternatives, not just go on theoretical considerations!-).
MSDN has articles on tempdb (where temp tables are kept) and optimizing its performance.
Step 1. Make sure you have a problem to solve. Five thousand inserts isn't a lot to insert one at a time in a lot of contexts.
Are you certain that the simplest way possible isn't sufficient? What performance issues have you measured so far?
What do you need to do with those entries that do or don't exist in your table??
Depending on what you need, maybe the new MERGE statement in SQL Server 2008 could fit your bill - update what's already there, insert new stuff, all wrapped neatly into a single SQL statement. Check it out!
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://www.sql-server-performance.com/articles/dba/SQL_Server_2008_MERGE_Statement_p1.aspx
http://blogs.msdn.com/brunoterkaly/archive/2008/11/12/sql-server-2008-merge-capability.aspx
Your statement would look something like this:
MERGE INTO
(your target table) AS t
USING
(your source table, e.g. a temporary table) AS s
ON t.ID = s.ID
WHEN NOT MATCHED THEN -- new rows does not exist in base table
....(do whatever you need to do)
WHEN MATCHED THEN -- row exists in base table
... (do whatever else you need to do)
;
To make this really fast, I would load the "new" records from e.g. a TXT or CSV file into a temporary table in SQL server using BULK INSERT:
BULK INSERT YourTemporaryTable
FROM 'c:\temp\yourimportfile.csv'
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR =' |\n'
)
BULK INSERT combined with MERGE should give you the best performance you can get on this planet :-)
Marc
PS: here's a note from TechNet on MERGE performance and why it's faster than individual statements:
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables are evaluated and processed multiple times; at least once for each statement.
By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. This topic provides best practice recommendations to help you achieve optimal performance when using the MERGE statement.
Try to ensure you end up running only one query - i.e. if your solution consists of running 5000 queries against the database, that'll probably be the biggest consumer of resources for the operation.
If you can insert the 5000 IDs into a temporary table, you could then write a single query to find the ones that don't exist in the database.
If you want simplicity, since 5000 records is not very many, then from C# just use a loop to generate an insert statement for each of the strings you want to add to the table. Wrap the insert in a TRY CATCH block. Send em all up to the server in one shot like this:
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
if you have a unique index or primary key defined on your string GUID, then the duplicate inserts will fail. Checking ahead of time to see if the record does not exist just duplicates work that SQL is going to do anyway.
If performance is really important, then consider downloading the 5000 GUIDS to your local station and doing all the analysis localy. Reading 5000 GUIDS should take much less than 1 second. This is simpler than bulk importing to a temp table (which is the only way you will get performance from a temp table) and doing an update using a join to the temp table.
Since you are using Sql server 2008, you could use Table-valued parameters. It's a way to provide a table as a parameter to a stored procedure.
Using ADO.NET you could easily pre-populate a DataTable and pass it as a SqlParameter.
Steps you need to perform:
Create a custom Sql Type
CREATE TYPE MyType AS TABLE
(
UniqueId INT NOT NULL,
Column NVARCHAR(255) NOT NULL
)
Create a stored procedure which accepts the Type
CREATE PROCEDURE spInsertMyType
#Data MyType READONLY
AS
xxxx
Call using C#
SqlCommand insertCommand = new SqlCommand(
"spInsertMyType", connection);
insertCommand.CommandType = CommandType.StoredProcedure;
SqlParameter tvpParam =
insertCommand.Parameters.AddWithValue(
"#Data", dataReader);
tvpParam.SqlDbType = SqlDbType.Structured;
Links: Table-valued Parameters in Sql 2008
Definitely do not do it one-by-one.
My preferred solution is to create a stored procedure with one parameter that can take and XML in the following format:
<ROOT>
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000">
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000001">
....
</ROOT>
Then in the procedure with the argument of type NCHAR(MAX) you convert it to XML, after what you use it as a table with single column (lets call it #FilterTable). The store procedure looks like:
CREATE PROCEDURE dbo.sp_MultipleParams(#FilterXML NVARCHAR(MAX))
AS BEGIN
SET NOCOUNT ON
DECLARE #x XML
SELECT #x = CONVERT(XML, #FilterXML)
-- temporary table (must have it, because cannot join on XML statement)
DECLARE #FilterTable TABLE (
"ID" UNIQUEIDENTIFIER
)
-- insert into temporary table
-- #important: XML iS CaSe-SenSiTiv
INSERT #FilterTable
SELECT x.value('#ID', 'UNIQUEIDENTIFIER')
FROM #x.nodes('/ROOT/MyObject') AS R(x)
SELECT o.ID,
SIGN(SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END)) AS FoundInDB
FROM #FilterTable o
LEFT JOIN dbo.MyTable t
ON o.ID = t.ID
GROUP BY o.ID
END
GO
You run it as:
EXEC sp_MultipleParams '<ROOT><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000"/><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000002"/></ROOT>'
And your results look like:
ID FoundInDB
------------------------------------ -----------
60EAD98F-8A6C-4C22-AF75-000000000000 1
60EAD98F-8A6C-4C22-AF75-000000000002 0

Enhance performance of large slow dataloading query

I'm trying to load data from oracle to sql server (Sorry for not writing this before)
I have a table(actually a view which has data from different tables) with 1 million records atleast. I designed my package in such a way that i have functions for business logics and call them in select query directly.
Ex:
X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)
Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id)
FROM Table1
Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns.
And some functions compare the parameters passed with audit table and perform logic
How can i improve performance of my query or is there a better way to do this
I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception.
my function does selects and then performs logic for example:
Function(c_x2, eid)
Select col1
into p_x1
from tableP
where eid = eid;
IF (p_x1 = NULL) THEN
ret_var := 'INITIAL';
ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN
ret_var:= 'RL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'RL', eid, 'PackageProcName');
ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN
ret_var := 'GL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'GL', eid, 'PackgProcName');
END IF;
RETURN ret_var;
i'm getting each row and performing
logic in C# and then inserting
If possible INSERT from the SELECT:
INSERT INTO YourNewTable
(col1, col2, col3)
SELECT
col1, col2, col3
FROM YourOldTable
WHERE ....
this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row.
EDIT as for the OP question edit:
you should be able to replace the function call to plain SQL in your query. Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.
EDIT based on OP recent comments:
since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. and all the other details. There are much better ways of bulk loading data into SQL Server. As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.
My recommendation is that you do not use functions and then call them within other SELECT statements. This:
SELECT t.id, ...
x1(t.id) ...
FROM TABLE t
...is equivalent to:
SELECT t.id, ...
(SELECT x.column FROM x1 x WHERE x.id = t.id)
FROM TABLE t
Encapsulation doesn't work in SQL like when using C#/etc. While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.
A better approach would be to update the supporting function to include the join criteria (IE: "where x.id = t.id" for lack of real one) in the SELECT:
SELECT x.id
x.column
FROM x1 x
...so you can use it as a JOIN:
SELECT t.id, ...
x1.column
FROM TABLE t
JOIN (SELECT x.id,
x.column
FROM MY_PACKAGE.x) x1 ON x1.id = t.id
I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped.
Personally I'd create an SSIS import to do this task. USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.
Firstly you need to find where the performance problem actually is. Then you can look at trying to solve it.
What is the performance of the view like? How long does it take the view to execute
without any of the function calls? Try running the command
How well does it perform? Does it take 1 minute or 1 hour?
create table the_view_table
as
select *
from the_view;
How well do the functions perform? According to the description you are making approximately 5 million function calls. They had better be pretty efficient! Also are the functions defined as deterministic. If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls.
Is there a way of reducing the number of function calls? The function are being called once the view has been evaluated and the million rows of data are available. BUT are all the input values from the highest level of the query? Can the function calls be imbeded into the view at a lower level. Consider the following two queries. Which would be quicker?
select
f.dim_id,
d.dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from large_fact_table f
join small_dim_table d on (f.dim_id = d.dim_id)
select
f.dim_id,
d.dim_col_1,
d.dim_col_2
from large_fact_table f
join (
select
dim_id,
dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from small_dim_table) d on (f.dim_id = d.dim_id)
Ideally the second query should run quicker as it calling the function fewer times.
The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts.
A couple of tips:
Don't load all records into RAM but process them one by one.
Try to run as many functions on the client as possible. Databases are really slow to execute user defined functions.
If you need to join two tables, it's sometimes possible to create two connections on the client. Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them.
If your functions always return the same result for the same input, use a computed column or a materialized view. The database will run the function once and save it in a table somewhere. That will make INSERT slow but SELECT quick.
Create a sorted intex on your table.
Introduction to SQL Server Indizes, other RDBMS are similar.
Edit since you edited your question:
Using a view is even more sub-optimal, especially when querying single rows from it. I think your "busines functions" are actually something like stored procedures?
As others suggested, in SQL always go set based. I assumed you already did that, hence my tip to start using indexing.

Categories

Resources