How to batch inserts via Impala ODBC? - c#

I have been querying and inserting data from and to Impala via ODBC, but it is slow (at least compared to a Postgres or SQL Server) and ODBC driver makes possible to execute queries one by one, which is absolutely not recommended as every insert creates a new file in HDFS.
I read through ODBC docs available on Simba site and Cloudera site, but batch operations or direct SQL execution is not mentioned.
This is the code I tried so far
static void Main(string[] args)
{
string insert = $"INSERT INTO tbl(version, snapshot) " +
$"VALUES(?, ?)";
OdbcConnection connection = new OdbcConnection("DSN=connection");
connection.Open();
using (OdbcCommand insertCommand = new OdbcCommand(insert, connection))
{
for (int i = 10; i < 15; i++)
{
List<OdbcParameter> parameters = new List<OdbcParameter>();
OdbcParameter versionParam = new OdbcParameter("#version", OdbcType.Text);
versionParam.Value = "bla" + i;
parameters.Add(versionParam);
OdbcParameter snapshotParam = new OdbcParameter("#snapshot", OdbcType.Text);
snapshotParam.Value = "blabla" + i;
parameters.Add(snapshotParam);
insertCommand.Parameters.AddRange(parameters.ToArray());
}
string query = insertCommand.CommandText.ToString();
Console.WriteLine(query);
//insertCommand.ExecuteReader();
insertCommand.ExecuteNonQuery();
}
}
A single row is inserted however 5 should be. What I'm doing wrong?

The .ExecuteNonQuery() call needs to be inside the for (....) loop ..... on the other hand, the code to create the parameters must be outside the for () loop - inside the loop, you should only set the parameter's values - not keep re-creating them over and over again ....
Try this code:
static void Main(string[] args)
{
string insert = $"INSERT INTO tbl(version, snapshot) VALUES(?, ?)";
OdbcConnection connection = new OdbcConnection("DSN=connection");
connection.Open();
using (OdbcCommand insertCommand = new OdbcCommand(insert, connection))
{
// *create* the parameters *outside* the "for" loop - only once!
List<OdbcParameter> parameters = new List<OdbcParameter>();
OdbcParameter versionParam = new OdbcParameter("#version", OdbcType.Text);
parameters.Add(versionParam);
OdbcParameter snapshotParam = new OdbcParameter("#snapshot", OdbcType.Text);
parameters.Add(snapshotParam);
insertCommand.Parameters.AddRange(parameters.ToArray());
for (int i = 10; i < 15; i++)
{
// inside the "for" loop - only set the values of the parameters
versionParam.Value = "bla" + i;
snapshotParam.Value = "blabla" + i;
// ... and then *execute* the query to run the insert!
string query = insertCommand.CommandText.ToString();
Console.WriteLine(query);
insertCommand.ExecuteNonQuery();
}
}
}

Related

This C# / sql query code takes a lot of time to update the table

Can anyone help improve performance? Updating the table takes a lot of time.
I am updating the serial number from datagridview to a table called dbo.json
// UPDATE dbo.json with numbers
private void BtnUpdateSql_Click(object sender, EventArgs e)
{
string VAL1;
string VAL2;
foreach (DataGridViewRow row in DgvWhistlSorted.Rows)
if (string.IsNullOrEmpty(row.Cells[5].Value as string))
{
}
else
{
for (int i = 0; i <= DgvWhistlSorted.Rows.Count - 2; i++)
{
VAL1 = DgvWhistlSorted.Rows[i].Cells[6].Value.ToString();
VAL2 = DgvWhistlSorted.Rows[i].Cells[0].Value.ToString();
var cnn = ConfigurationManager.ConnectionStrings["sql"].ConnectionString;
using (var con = new SqlConnection(cnn))
{
SqlCommand cmd = new SqlCommand();
cmd.CommandType = CommandType.Text;
cmd.CommandText = "UPDATE dbo.json SET RowN = #VAL1 WHERE [A-order] = #VAL2";
cmd.Parameters.AddWithValue("#VAL1", VAL1);
cmd.Parameters.AddWithValue("#VAL2", VAL2);
cmd.Connection = con;
con.Open();
cmd.ExecuteNonQuery();
con.Close();
}
}
}
MessageBox.Show("dbo.json is ready");
}
You shouldn't create the connection and command inside such a tight loop - create and open the connection and command ONCE before the loop, and in the loop, only set the parameter values and execute the query for each entry.
Something like this:
// UPDATE dbo.json with numbers
private void BtnUpdateSql_Click(object sender, EventArgs e)
{
string VAL1;
string VAL2;
// define connection string, query text *ONCE* before the loop
string cnn = ConfigurationManager.ConnectionStrings["sql"].ConnectionString;
string updateQuery = "UPDATE dbo.json SET RowN = #VAL1 WHERE [A-order] = #VAL2;";
// create connection and command *ONCE*
using (SqlConnection con = new SqlConnection(cnn))
using (SqlCommand cmd = new SqlCommand(updateQuery, cnn))
{
// Define parameters - adapt as needed (don't know the actual datatype they have)
cmd.Parameters.Add("#VAL1", SqlDbType.VarChar, 100);
cmd.Parameters.Add("#VAL2", SqlDbType.VarChar, 100);
// open connection ONCE, for all updates
con.Open();
foreach (DataGridViewRow row in DgvWhistlSorted.Rows)
{
if (!string.IsNullOrEmpty(row.Cells[5].Value as string))
{
for (int i = 0; i <= DgvWhistlSorted.Rows.Count - 2; i++)
{
VAL1 = DgvWhistlSorted.Rows[i].Cells[6].Value.ToString();
VAL2 = DgvWhistlSorted.Rows[i].Cells[0].Value.ToString();
// set the values
cmd.Parameters["#VAL1"].Value = VAL1;
cmd.Parameters["#VAL2"].Value = VAL2;
// execute query
cmd.ExecuteNonQuery();
}
}
}
// close connection after all updates are done
con.Close();
}
MessageBox.Show("dbo.json is ready");
}
Create the connection ONCE...you're creating a new database connection each time through the loop! And in fact you do not need to create new command objects each time. You can reuse the command object because the parameters are the same. Just clear the params each time through the loop.
Also don't do the grid view count in the loop, set a variable for it.
string query = "UPDATE dbo.json SET RowN = #VAL1 WHERE [A-order] = #VAL2";
int counter = DgvWhistlSorted.Rows.Count - 2;
using (SqlConnection con = new SqlConnection(cnn))
{
con.Open();
using(SqlCommand cmd = new SqlCommand(cnn,query))
{
cmd.Parameters.Clear();
//Do your loop in here
for (int i = 0; i <= counter; i++)
{
VAL1 = DgvWhistlSorted.Rows[i].Cells[6].Value.ToString();
VAL2 = DgvWhistlSorted.Rows[i].Cells[0].Value.ToString();
cmd.Parameters.AddWithValue("#VAL1", VAL1);
cmd.Parameters.AddWithValue("#VAL2", VAL2);
cmd.ExecuteNonQuery();
}
}
}
A better idea is to do this in one command, by passing all the data in a Table-Value Parameter (TVP):
First create the table type. I don't know your data types, so I'm guessing here. Make sure to match the types to the existing table.
CREATE TYPE dbo.OrderJson (
Order int PRIMARY KEY,
RowN nvarchar(max) NOT NULL
);
Then you can pass the whole thing in one batch. You need to create a DataTable to pass as the parameter, or you can use an existing datatable.
// UPDATE dbo.json with numbers
private void BtnUpdateSql_Click(object sender, EventArgs e)
{
var table = new DataTable {
Columns = {
{ "Order", typeof(int) },
{ "RowN", typeof(string) },
},
};
foreach (DataGridViewRow row in DgvWhistlSorted.Rows)
if (!string.IsNullOrEmpty(row.Cells[5].Value as string))
table.Rows.Add(DgvWhistlSorted.Rows[i].Cells[0].Value, DgvWhistlSorted.Rows[i].Cells[6].Value)
const string query = #"
UPDATE dbo.json
SET RowN = t.RowN
FROM dbo.json j
JOIN #tmp t ON t.order = j.[A-order];
";
using (var con = new SqlConnection(ConfigurationManager.ConnectionStrings["sql"].ConnectionString))
using (var cmd = new SqlCommand(query, con))
{
cmd.Parameters.Add(new SqlParameter("#tmp", SqlDbType.Structured) { Value = table, TypeName = "dbo.OrderJson" });
con.Open();
cmd.ExecuteNonQuery();
}
MessageBox.Show("dbo.json is ready");
}
I found that the fastest way would be to save the DATAGRIDVIEW to an SQL table and continue the process with - stored procedure + update query - between two tables - now it flies ...
Thank you all

OleDbCommand with 'where in' and dbParameter

I try to delete some rows from a table in an access database file via C#.
This attempt fails with no error which leads me to the conclusion that I have a valid query with incorrect data.
I tried to see if I can query the data with a select statement from my code and I can narrow the problem down to the parameters.
The statement should look as follows
SELECT * FROM tbIndex where pguid in ('4a651816-e15b-4c6a-85c4-74033ca6c423', '0add7bff-a22f-4238-9c7f-e1ff4ed3c7e2', '742fae8b-2692-4a6f-802c-848fad570696', '5e6b65de-2403-4800-a47d-e57c7bd8e0a6')
I tried two different ways*(dbCmd2 and dbCmd3)* from which the first*(dbCmd2)* works but is, due to injection problems, not my prefered solution.
using (OleDbCommand dbCmd2 = new OleDbCommand { Connection = m_Connection })
{
dbCmd2.CommandText = "SELECT * FROM tbIndex where pguid in ("+pguid+")";
using (DbDataReader reader = dbCmd2.ExecuteReader())
{
List<object[]> readValuesFromIndex = new List<object[]>();
while (reader.Read())
{
//Point reached
object[] arr = new object[reader.VisibleFieldCount];
reader.GetValues(arr);
//...
}
reader.Close();
}
using (OleDbCommand dbCmd3 = new OleDbCommand { Connection = m_Connection })
{
dbCmd3.CommandText = "SELECT * FROM tbIndex where pguid in (#pguid)";
dbCmd3.Parameters.Add("#pguid", OleDbType.VarChar).Value = pguid;
using (DbDataReader reader = dbCmd3.ExecuteReader())
{
List<object[]> readValuesFromIndex = new List<object[]>();
while (reader.Read())
{
//Point not reached
object[] arr = new object[reader.VisibleFieldCount];
reader.GetValues(arr);
//...
}
reader.Close();
}
}
Note that pguid is set to "'4a651816-e15b-4c6a-85c4-74033ca6c423', '0add7bff-a22f-4238-9c7f-e1ff4ed3c7e2', '742fae8b-2692-4a6f-802c-848fad570696', '5e6b65de-2403-4800-a47d-e57c7bd8e0a6'".
I always thought that the second option would simply replace the parameter in a safe manner but this is obviously not the case.
My question is:
Why doesn't the second option return any values?
A parameter always is a single value.
An in clause requires multiple values, separated by comma's.
You can do something like the following to pass them like separate parameters:
string[] guids = pguid.Split(',');
string sqlin = "";
int paramno = -1;
foreach (var guid in guids)
{
parametercount ++;
sqlin = sqlin + "#Param" + (string)parametercount; + ","
}
dbCmd3.CommandText = "SELECT * FROM tbIndex where pguid in (" + sqlin.Substring(0, sqlin.Length-1) + ")";
for(int i = 0; i <= parametercount; i++){
dbCmd3.Parameters.Add("#Param" + (string)i, OleDbType.VarChar).Value = guids[i].Replace("'", "");
}

c# change parameter for sql in loop

I want to reuse a parameterized query in a loop.
(This query is a simple example, I don't think I could make the loop inside sql and just return the needed rows)
Instead of
private String sql = "SELECT v FROM t WHERE VAL_1 = #param_1";
for (int n=1;n<10;n++)
{
MySqlCommand m = new MySqlCommand(sql);
m.Parameters.AddWithValue("#param_1", n);
res = Convert.ToInt32(m.ExecuteScalar());
( ... )
}
I'd like to move the setup of the query outside the loop; something like
private String sql = "SELECT v FROM t WHERE VAL_1 = #param_1";
MySqlCommand m = new MySqlCommand(sql);
m.Parameters.Add("#param_1"); // does not exist
for (int n=1;n<10;n++)
{
m.Parameters.Set("#param_1", n); // does not exist
res = Convert.ToInt32(m.ExecuteScalar());
( ... )
}
So the server does not have to parse the same sql for each ilteration in loop.
Is that possible?
You can add a parameter with
m.Parameters.Add("#param_1", MySqlDbType.Int32);
and later in the loop assign a value with
m.Parameters["#param_1"].Value = n;
If you just need to run query for list of parms without do diffrent things on each result, You can create a string with a loop like that:
String where_str= VAL_1 = #param_1" OR VAL_1 = #param_2" OR VAL_1 = #param_3"...
String sql = "SELECT v FROM t WHERE " + where_str;
and then exec the query it will give the same result.
If you need to saparate results so you can make it with prepaerd statement. Also, I recommend you to read about stored procedure it may be the best soultion for you in some cases.
example for prepaerd statement: (more info in the link)
private static void SqlCommandPrepareEx(string connectionString)
{
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
SqlCommand command = new SqlCommand(null, connection);
// Create and prepare an SQL statement.
command.CommandText =
"INSERT INTO Region (RegionID, RegionDescription) " +
"VALUES (#id, #desc)";
SqlParameter idParam = new SqlParameter("#id", SqlDbType.Int, 0);
SqlParameter descParam =
new SqlParameter("#desc", SqlDbType.Text, 100);
idParam.Value = 20;
descParam.Value = "First Region";
command.Parameters.Add(idParam);
command.Parameters.Add(descParam);
// Call Prepare after setting the Commandtext and Parameters.
command.Prepare();
command.ExecuteNonQuery();
// Change parameter values and call ExecuteNonQuery.
command.Parameters[0].Value = 21;
command.Parameters[1].Value = "Second Region";
command.ExecuteNonQuery();
}
}
Yes, this should be possible! Have a look for SQL Prepared Statements!
You can just use:
cmd = new MySqlCommand("SELECT * FROM yourTable WHERE condition=#val1", MySqlConn.conn);
In the loop add the parameters and prepare the command
cmd.Parameters.AddWithValue("#val1", value);
cmd.Prepare();
after the loop execute your query with
cmd.ExecuteNonQuery();
Yep, you can do all of those things but unless that's just an example you'd want to use IN with all the values or a join to a bulk loaded temp table if there are a large number of them. The reason is that each round trip to the DB has a significant overhead that you can reduce from n to 1 with either of those techniques.

Data is not inserted into local database

I've recently started learning some databases things in .NET (C#) and I have a problem with inserting list of objects into local database. I don't know the reason but, after executing query, base is still empty and there is no error or warning, can you tell me if there is any problem with the code, or is there any other reason why it does not work properly.
I've tried to debug, but code seems to work, it passes if statement, and adds parameter's value, I've also removed thread start method and did it synchronously but still nothing.
public static void SaveData()
{
new Thread(() =>
{
Thread.CurrentThread.IsBackground = true;
using (SqlConnection conn = new SqlConnection(conString))
{
conn.Open();
using (SqlCommand cmd = new SqlCommand("INSERT INTO Przestoje(Urzadzenie, Czas, Data) VALUES(#nazwa, #czas, #data)", conn))
{
cmd.Parameters.AddWithValue("#nazwa", SqlDbType.NVarChar);
cmd.Parameters.AddWithValue("#czas", SqlDbType.Int);
cmd.Parameters.AddWithValue("#data", SqlDbType.NVarChar);
int count = allEncounters.Count;
for (int i = 0; i < count; i++)
{
if (i >= NextIndex)
{
cmd.Parameters["#nazwa"].Value = allEncounters[i].Name;
cmd.Parameters["#czas"].Value = allEncounters[i].timeOnTimeout * 10;
cmd.Parameters["#data"].Value = allEncounters[i].startDate.ToString();
}
}
NextIndex = count;
}
}
}).Start();
}
At some point you have to actually execute the SQL command, which you currently don't do:
cmd.ExecuteNonQuery();
If you're looking to insert one record, this would be near the end. Though it looks like you're trying to insert multiple records in a loop, so this would of course happen inside the loop. (Once per record.)
You have to execute the SqlCommand. Put this before "NextIndex = count;":
cmd.ExecuteNonQuery();
You forgot to run in using block:
cmd.ExecuteNonQuery();
You can trim that down
for (int i = NextIndex; i < allEncounters.Count; i++)
{
cmd.Parameters["#nazwa"].Value = allEncounters[i].Name;
cmd.Parameters["#czas"].Value = allEncounters[i].timeOnTimeout * 10;
cmd.Parameters["#data"].Value = allEncounters[i].startDate.ToString();
cmd.ExecuteNonQuery();
}
NextIndex = allEncounters.Count;

Parameterize WHERE Clause in Query

Environment:
C#
Visual Studio 2012
.NET Framework 3.5
Hi
Could I parameterize where clause in SQL Server?
In my scenario, once a WHERE clause String is input, application will concatenate it to other part of query and execute in SQL Server then return the result.
For example,
User inputs "[CookingTime] < 30 and [Cost] < 20"
Application creates query "select [RecipeID] from [Recipes] where [CookingTime] < 30 and [Cost] < 20" and executes in SQL Server.
Application returns result to user.
For security reason, I would like to make whole WHERE CLAUSE as parameter.
But I have no idea how to achieve.
Thanks in advance.
This is how it can be done
string commandText = "UPDATE Sales.Store SET Demographics = #demographics "
+ "WHERE CustomerID = #ID;";
using (SqlConnection connection = new SqlConnection(connectionString))
{
SqlCommand command = new SqlCommand(commandText, connection);
command.Parameters.Add("#ID", SqlDbType.Int);
command.Parameters["#ID"].Value = customerID;
// Use AddWithValue to assign Demographics.
// SQL Server will implicitly convert strings into XML.
command.Parameters.AddWithValue("#demographics", demoXml);
try
{
connection.Open();
Int32 rowsAffected = command.ExecuteNonQuery();
Console.WriteLine("RowsAffected: {0}", rowsAffected);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
The whole WHERE clause as parameter will be a victim of sql injection in any way. To prevent this you'd better to:
Setup proper permissions. So even in case of sql injected user can't access anything not granted. In this case sample of #Dhaval is better, because dymanic sql generation incapsulated in stored procedure requires less permissions to execute.
Check the statement for sql injection. The simplest way is to check for semicolons in order to avoid another statements in the batch. More complex and more precise way is to use t-sql DOM parser. For example:
using Microsoft.SqlServer.TransactSql.ScriptDom;
TSql110Parser parser = new TSql110Parser(true);
IList<ParseError> errors = null;
var condition = "a > 100; delete from [Recipes]";
var script = parser.Parse(new StringReader("select [RecipeID] from [Recipes] where " + condition), out errors) as TSqlScript;
if (errors.Count > 0)
{
throw new Exception(errors[0].Message);
}
foreach (var batch in script.Batches)
{
if (batch.Statements.Count == 1)
{
var select = batch.Statements[0] as SelectStatement;
if (select != null)
{
QuerySpecification query = select.QueryExpression as QuerySpecification;
if (query.WhereClause is BooleanBinaryExpression)
{
...
}
}
else
{
throw new Exception("Select statement only allowed");
}
}
else
{
throw new Exception("More than one statement detected");
}
}
You can create a dynamic query in sql server and pass the parameter from C#
Something like this
Create Procedure usp_Test
#WhereCond Varchar(max)
AS
Bgein
Set NoCount ON
Declare #SQLQuery AS Varchar(max)
Set #SQLQuery = 'Select * From tblEmployees where ' + #WhereCond
Execute sp_Executesql #SQLQuery
End
C# Code to execute the procedure
DataSet ds = new DataSet();
using(SqlConnection conn = new SqlConnection("ConnectionString"))
{
SqlCommand sqlComm = new SqlCommand("usp_Test", conn);
sqlComm.Parameters.AddWithValue("#WhereCond", WhereCond);
sqlComm.CommandType = CommandType.StoredProcedure;
SqlDataAdapter da = new SqlDataAdapter();
da.SelectCommand = sqlComm;
da.Fill(ds);
}
I guess the original question wanted to find out how to make it dynamically from user's input and then use proper sql parameter to do the query.
For the usage of sql parameter, normally what I do is to use a generic helper method, a quick example (not tested):
public static class SqlHelpers
{
public static IEnumerable<T> ExecuteAdhocQuery<T>(SqlConnection con, string sql, CommandType cmdType, Func<SqlDataReader, T> converter, params SqlParameter[] args)
{
try
{
using (SqlCommand cmd = new SqlCommand(sql, con) { CommandType = cmdType })
{
cmd.Parameters.AddRange(args);
if (con.State != ConnectionState.Open) { con.Open(); }
var ret = new List<T>();
using (SqlDataReader rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
ret.Add(converter.Invoke(rdr));
}
}
return ret;
}
}
catch (Exception e)
{
// log error?
Console.WriteLine(e.Message);
Console.WriteLine(e.StackTrace);
throw e; // handle exception...
}
}
public void Test()
{
using (SqlConnection con = new SqlConnection("connection string here"))
{
var data = ExecuteAdhocQuery(con,
"SELECT ID, Name FROM tblMyTable WHERE ID = #Id and Status = #Status;",
CommandType.Text, (x) => new { Id = x.GetInt32(0), Name = x.GetString(1) },
new SqlParameter("#Id", SqlDbType.Int) { Value = 1 },
new SqlParameter("#Status", SqlDbType.Bit) { Value = true });
Console.WriteLine(data.Count());
}
}
}
of course, this is only Reading, for Insert/Update, similar methods could be created too.
But the complicated part is how to make it dynamic with unknown number of conditions and the relationship between them. So a quick suggestion is use a delegated method or class to do the work. sample (not tested):
public static Dictionary<string, SqlParameter> GetParamsFromInputString(string inputString)
{
var output = new Dictionary<string, SqlParameter>();
// use Regex to translate the input string (something like "[CookingTime] < 30 and [Cost] < 20" ) into a key value pair
// and then build sql parameter and return out
// The key will be the database field while the corresponding value is the sql param with value
return output;
}
public void TestWithInput(string condition)
{
var parameters = GetParamsFromInputString(condition);
// first build up the sql query:
var sql = "SELECT Id, Name from tblMyTable WHERE " + parameters.Select(m => string.Format("{0}={1}", m.Key, m.Value.ParameterName)).Aggregate((m,n) => m + " AND " + n);
using (SqlConnection con = new SqlConnection("connection string here"))
{
var data = ExecuteAdhocQuery(con,
sql,
CommandType.Text,
(x) => new { Id = x.GetInt32(0), Name = x.GetString(1) },
parameters.Select(m => m.Value).ToArray());
}
}
for the static function GetParamsFromInputString, it's just a sample. actually it could be very complicated depending on your needs.
for example, you might want to include the operator (whether it's >, < or <>,...).
and you might also want to include the conjunctions between the conditions, whether it's AND or OR.
Build delegated classes to do the job if it's very complicated.

Categories

Resources