C# vs Python query against SQLite database - c#

I have a SQLite database that has a single table with 18 million rows and 24 columns. The schema is along the lines of,
Date (VARCHAR)
CompanyName (VARCHAR)
Amount (REAL)
AggCode (VARCHAR)
Level1 ... Level20 (VARCHAR)
I am querying the table two ways - first with a Python script, and then with a C# function that is exposed to Excel with ExcelDNA (ultimately I'd prefer to use Excel to run the queries as some queries will return rows of data that need to be further manipulated).
I find that Python usually outperforms the Excel add-in by a factor of 3-5x, and I was wondering if there was something wrong with my code. Sample query below,
query = "SELECT Sum(Amount) FROM my WHERE Level9='STIN_J' AND (AggCode='R_REAL' AND Date='05DEC2016')"
The queries are usually run combining the fields Level9, Level5, AggCode, Date, CompanyName in the WHERE clause. So apart from the raw table, I have also configured the following four indices,
CREATE INDEX idx1 on my(Level09, AggCode);
CREATE INDEX idx2 on my(Level05, AggCode);
CREATE INDEX idx3 on my(CompanyName, AggCode);
CREATE INDEX idx4 on my(Date, AggCode);
This is the sample Python code to run a query,
import sqlite3 as lite
...
con = lite.connect("C:\temp\my.db")
cur = con.cursor()
cur.execute(query)
data = cur.fetchall
for row in data:
for i in range(len(row)):
print row[i],
print "\t",
On the whole this code works rather well.
This is the sample C# code to run a query,
using System.Data.SQLite;
...
string constr = "Data Source=C:\temp\my.db;Version=3;";
SQLiteConnection conn = new SQLiteConnection(constr);
SQLiteCommand command = new SQLiteCommand(query, conn);
conn.Open();
SQLiteDataAdapter sda = new SQLiteDataAdapter(command);
DataTable dt = new DataTable();
sda.Fill(dt);
sda.Dispose();
command.Dispose();
conn.Dispose();
object[,] ret = new object[dt.Rows.Count, dt.Columns.Count];
int rowCount = 0;
int colCount = 0;
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn col in dt.Columns)
{
ret[rowCount, colCount] = col.ColumnName;
colCount++;
}
rowCount++;
}
...
return ret;
Are either the Python or C# codes sub-optimal? For example, should I use SQLiteDataReader instead of SQLiteDataAdapter? Would appreciate any thoughts.
The result sets themselves are pretty small, in some cases just a single number, so I wouldn't have thought that ExcelDNA was adding headroom to the process. Sample Python query take about 15 seconds, whereas C# up to 1 minute.
Finally how would amending the PRAGMA settings affect performance? Any suggestions on some generic settings given my top priority is the speed of the query?
Also, any suggestions on how to actually implement these settings in Python, C# or persist them would be much appreciated.

I wasn't purging the datatable, SQL command or connection properly. Once I stuck the Dispose methods into a finally block, the response of the C# code increased drastically and outperforming Python in most cases.

Related

Oracle.ManagedDataAccess.Client.OracleCommand.ExecuteReader Missing Results

I have a very simple method that utilizes Oracle.ManagedDataAccess to query datatables in Oracle. The code is below.
private System.Data.DataTable ByQuery(Oracle.ManagedDataAccess.Client.OracleConnection connection, string query)
{
using(var cmd = new Oracle.ManagedDataAccess.Client.OracleCommand())
{
cmd.Connection = connection;
cmd.CommandText = query;
cmd.CommandType = System.Data.CommandType.Text;
var dr = cmd.ExecuteReader();
dr.Read();
var dataTable = new System.Data.DataTable();
dataTable.Load(dr);
var recordCount = dataTable.Rows.Count;
return dataTable;
}
}
Using a very simple query such as:
SELECT * FROM NKW.VR_ORDER_LI WHERE CTRL_NO = 10
Returns 32 rows of data. However when I run the exact same query from Oracle SQL Developer using the exact same user account I'm using in my connection string for the C# App, I get 33 results.
I'm consistently missing a single row of data.
I've tried querying a different CTRL_NO:
SELECT * FROM NKW.VR_ORDER_LI WHERE CTRL_NO = 17
.Net returns 8 results.
Oracle Sql Developer returns 9 results.
I've tried removing the WHERE statement and just getting all results.
Still 1 row difference between the two.
I've tried googling for an answer but haven't been successful. Any help or advice would be appreciated.
UPDATE 1:
I've determined that I'm always missing the first result that I see in Oracle SQL Developer when I run the exact same query from my C# App.
UPDATE 2:
As suggested I took the DataTable out of the equation.
int rowCount = 0;
while(dr.Read())
{
rowCount++;
}
rowCount skipping the DataTable still results in a missing record.
UPDATE 3:
Tested against a completely different table, NKW.VR_ORDER_LI is actually a view. Still had the same results for some reason I end up with one less row of results from the ExecuteReader() than I do from within SQL Developer.
I ended up figuring out my issue from this thread:
Datareader skips first result
So the culprit in all of this was this part of the code:
var dr = cmd.ExecuteReader();
dr.Read();
var dataTable = new System.Data.DataTable();
dataTable.Load(dr);
The first dr.Read() was'nt neccessary. Getting rid of this line of code solved the problem.
Final fix:
var dr = cmd.ExecuteReader();
var dataTable = new System.Data.DataTable();
dataTable.Load(dr);
I also went back to using the DataTable because it is more consistent with how we interact with transactional data throughout our project at the current time.

Boost select statement performance of sqlite database in C#

I have a sqlite database consist of 50 columns and more than 1.2 million rows. I'm using System.Data.Sqlite to work with Visual Studio 2013 C# language.
I used a very simple code to retrieve my data from database but it is taking too much time.
private SQLiteConnection sqlite;
public MySqlite(string path)
{
sqlite = new SQLiteConnection("Data Source="+path+"\\DBName.sqlite");
}
public DataTable selectQuery(string query)
{
SQLiteDataAdapter ad;
DataTable dt = new DataTable();
try
{
SQLiteCommand cmd;
sqlite.Open();
cmd = sqlite.CreateCommand();
cmd.CommandText = query; //set the passed query
ad = new SQLiteDataAdapter(cmd);
ad.Fill(dt); //fill the datasource
}
catch (SQLiteException ex)
{
//exception code here.
}
sqlite.Close();
return dt;
}
And, the select statement is:
select * from table
As I told you, it is a very simple code.
I want to know how to boost select operation performance to get the appropriate result. for this code the process takes up to 1 minute which I want to get to less than 1 second.
and another thing is that there seems to be some tags for configuring sqlite database but I don't know where to apply them. could some one tell me how to configure sqlite database with System.Data.Sqlite;
Consider narrowing your result set by getting necessary columns or paging.

Very slow batching update with MySQL .Net/Connector 6.5.4/6.6.4

I am using MySQL5.6.9-rc with .net connector 6.5.4 to insert data into a table that has two fields (Interger ID, Integer Data, ID is the primary key). It is very slow (about 35 seconds) to insert 2000 rows into the table (no much difference for UpdateBatchSize = 1 and UpdateBatchSize = 500), I also tried connector 6.6.4, the problem remains.
However it is fast with MySQL5.4.3 and connector 6.20, it just took one second to insert 2000 rows to the table if sets UpdateBatchSize to 500 (it's also slow if UpdateBatchSize = 1). then I tested it with MySQL5.4.3 and connector 6.5.4 or 6.6.4, it is slow!
I wrote the code to insert data like below, run it with mysql6.6.9 and connector 6.54, Windows XP and VS2008.
public void Test()
{
MySqlConnection conn = new MySqlConnection("Database=myDatabase;Server=localhost;User Id=root;Password=myPassword");
string sql = "Select * from myTable";
MySqlDataAdapter adapter = new MySqlDataAdapter(sql, conn);
adapter.UpdateBatchSize = 500;
MySqlCommandBuilder commandBuilder = new MySqlCommandBuilder(adapter);
DataTable table = new DataTable();
adapter.Fill(table); //it is an empty table
Add2000RowsToTable(table);
int count = adapter.Update(table); //It took 35 seconds to complete.
adapter.Dispose();
conn.Close();
}
private void Add2000RowsToTable(DataTable table)
{
DataRow row;
for (int i = 0; i < 2000; i++)
{
row = table.NewRow();
row[0] = i;
row[1] = i;
table.Rows.Add(row);
}
}
It seems to me that MySqlDataAdapter.UpdateBatchSize is not functional with connector 6.5.4 and 6.64, is something wrong with my code?
Thanks in advance
Although this takes a bit of initial coding (and doesn't solve your issue directly), I highly recommend using LOAD DATA INFILE for anything longer than maybe 100 records.
In fact, in my own system, I've coded it once and I reuse it for all my inserts and updates, whether bulk or not.
LOAD DATA INFILE is much more scalable: I've used it to insert 100 million rows without noticeable performance degradation.
Did more test...
Check the logs in mysql server, for connector 6.20, it generates the sql statements for batching updates like below:
insert into mytable (id, data) values (0,0),(1,1),(2,2) ...
but for connector 6.54 and 6.64, the statements are different:
insert into mytable (id, data) values (0,0); Insert into mytable (id, data) values (1,1);Insert into mytable (id, data) values (2,2); ...
I think this is the reason why batching update is so slow with connector 6.5.4/6.6.4, is it a bug for 6.5.4/6.6.4? or the server (tried mysql 5.5.29/5.6.9) should handle the statements more smart?
The solution I went with was to write the bulk row data as CSV into a file and then import using the following command:
LOAD DATA LOCAL INFILE 'C:/path/to/file.csv'
INTO TABLE <tablename>
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(<field1>,<field2>);
only took about 4 seconds for 30000 rows. It's similar to the above recommendation, but allows you to use a file local to your system and not the server.

how to update datatable in c# with code?

I want move data from database to another database.
I write 2 function. function 1 : I fill table from database1 into a datatable and named this DT
in function 2 I fill table in database2 with Dt and named its dtnull
I update dtnull in database 2
function 2:
{
SqlDataAdapter sda = new SqlDataAdapter();
sda.SelectCommand = new SqlCommand();
sda.SelectCommand.Connection = objconn;
sda.SelectCommand.CommandText = "Select * from " + TableName + "";
DataTable dtnull = new DataTable();
sda.Fill(dtnull);
SqlCommandBuilder Builder = new SqlCommandBuilder();
Builder.DataAdapter = sda;
Builder.ConflictOption = ConflictOption.OverwriteChanges;
string insertCommandSql = Builder.GetInsertCommand(true).CommandText;
foreach (DataRow Row in Dt.Rows)
{
dtnull.ImportRow(Row);
}
sda.Fill(dtnull);
sda.Update(dtnull);
}
If you need to copy SQL database then just back it up and restore. Alternatively use DTS services.
If it's just a few tables I think you can
right click on the table you want in the SQL Management studio
generate a create script to your clipboard
execute it
Go back to your original table and select all the rows
copy them
go to your new table and paste
No need to make this harder than it is.
You don't really need to use an update for this. You might try out this solution, it might be the easiest way for you do this.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
If you would like a LINQ solution, I could provide you with one.
There is a lot that is left unexplained. For example, do the source table and target table have the same column structure?
Can you see both database from the same SqlConnection (i.e. are they on the same machine)? If so, you can do it all in one SQL statement. Assuming you want to copy the data from table T1 in databse DB1 to table T2 in database DB2, you would write
insert DB2.dbo.T2 select * from DB1.dbo.T1
Excecute using ExecuteNonQuery.
If the databases require different SqlConnections, I would read the data from the source using a SqlDataReader and update the target row by row. I think it would be faster than using a SqlDataAdapter and DataTable since they require more structure and memory. The Update command writes the data row by row in any event.

How to know when to stop filling an OracleDataAdapter

I'm using the OPD.NET dll in a project that is accessing oracle.
Users can type in any SQL into a text box, that is then executed against the db. I've been trying to use the OracleDataAdapter to populate a datatable with the resultset, but I want to be able to return the resultset in stages (for large select queries).
An example of my problem is...
If a select query returns 13 rows of data, the code snippet below will execute without issue until the fourth time oda.Fill (start row is 15 which doesn't exist) is called, I presume because it is calling into a reader that has closed or something similar.
It then will throw a System.InvalidOperationException with the message - Operation is not valid due to the current state of the object.
How can I find out how many rows in total the command will eventually contain (so that I don't encounter the exception)?
OracleDataAdapter oda = new OracleDataAdapter(oracleCommand);
oda.Requery = false;
var dts = new DataTable[] { dt };
DataTable dt = new DataTable();
oda.Fill(0, 5, dts);
var a = dts[0].Rows.Count;
oda.Fill(a, 5, dts);
var b = dts[0].Rows.Count;
oda.Fill(b, 5, dts);
var c = dts[0].Rows.Count;
oda.Fill(c, 5, dts);
var d = dts[0].Rows.Count;
Note: I've omitted the connection and oracle command objects for brevity.
EDIT 1:
I've just thought I could just wrap the SQL entered by the user in another query and execute it...
SELECT COUNT(*) FROM (...intial query in here...)
but this isn't exactly a clean solution, and surely there is a method somewhere that I haven't seen?
Thanks in advance.
For paging in Oracle, see: http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html
There is no way to know the record set count without running a separate count(*) query. This is by design. The DataReader and DataAdapter are forward-only, read only.
If efficiency is a concern (i.e., large record sets), one should let the database do the paging and not ask the OracleDataAdapter to run the full query. Imagine if Google filled a DataTable with all 1M+ results for each user search!! The following article addresses this concern, although the examples are in sql:
http://www.asp.net/data-access/tutorials/efficiently-paging-through-large-amounts-of-data-cs
I've revised my example below to allow paging on any sql query. The calling procedure is responsible for keeping track of the user's current page and page size. If the result set is less than the requested page size, there are no more pages.
Of course, running custom sql from user input is a huge security risk. But that wasn't the question at hand.
Good luck! --Brett
DataTable GetReport(string sql, int pageIndex, int pageSize)
{
DataTable table = new DataTable();
int rowStart = pageIndex * pageSize + 1;
int rowEnd = (pageIndex + 1) * pageSize;
string qry = string.Format(
#"select *
from (select rownum ""ROWNUM"", a.*
from ({0}) a
where rownum <= :rowEnd)
where ""ROWNUM"" >= :rowStart
", sql);
try
{
using (OracleConnection conn = new OracleConnection(_connStr))
{
OracleCommand cmd = new OracleCommand(qry, conn);
cmd.Parameters.Add(":rowEnd", OracleDbType.Int32).Value = rowEnd;
cmd.Parameters.Add(":rowStart", OracleDbType.Int32).Value = rowStart;
cmd.CommandType = CommandType.Text;
conn.Open();
OracleDataAdapter oda = new OracleDataAdapter(cmd);
oda.Fill(table);
}
}
catch (Exception)
{
throw;
}
return table;
}
You could add an Analytic COUNT to your query:
SELECT foo, bar, COUNT(*) OVER () TheCount WHERE ...;
That way the count of the entire query is returned with each row in TheCount, and you could set your loop to terminate accordingly.
To gain control over Fill DataTable Loop you need own the loop.
Then build your own Function to Fill DataTable using OracleDataReader.
To get Columns information, you can use dataReader.GetSchemaTable
To Fill the Table:
MyTable.BeginLoadData
Dim Values(mySchema.rows.count-1)
Do while myReader.read
MyReader.GetValues(Values)
MyTable.Rows.add(Values)
'Include here your control over load Count
Loop
MyTable.EndLoadData

Categories

Resources