c# insert multiple blobs into same table - c#

Can you do an insert statement with multiple file blob reads in the same command?
In the code below inputfile contains the following:
string[] inputfile = {"C:\\test_blob\\blob1.pdf","C:\\test_blob\\blob2.jpg"};
I'm uncertain if cmd.Parameters can be done prior to the cmd.CommandText or if I can do more than one File.ReadAllBytes() as a cmd.Parameter.
public static void insert_blob_file(string dbname, string uid, string pwd, string[] inputfile)
{
using (var conn = new OdbcConnection("DSN=" + dbname + ";UID=" + uid + ";pwd=" + pwd))
{
conn.Open();
using (var cmd = conn.CreateCommand())
{
for (int i = 0; i < inputfile.Count();i++)
{
var inputStream = new FileStream[i];
using (inputStream[i] = File.OpenRead(inputfile[i]))
{
cmd.Parameters.AddWithValue("blob" + i.ToString(), File.ReadAllBytes(inputfile[i]));
}
cmd.CommandText = "INSERT INTO MyTable (Id, MyBlobColumn,String1,MyBlobColum1,String2,String3) VALUES (1,#blob0,SomeString,#blob1,SomeString,SomeString)";
}
cmd.ExecuteNonQuery();
}
conn.Close();
}
}

First, wich of the 2/3 Strategies for storing blobs do you even use? Here is a nice listing of the 2 common and SQL Servers special approach to try to combine them: https://www.simple-talk.com/sql/learn-sql-server/an-introduction-to-sql-server-filestream/
Secondly building queries via string conaction is just going to expose you to SQL Injections. You really should be using the SQL Parameter Syntax instead. Aside from being safer, they might even be faster as the SQL servers does not need to imply types. You can explicitly tell it the types and proper mapping.
Thirdly, I asume you are calling a function like insert_blob_file in some form of multitasking. SQL Operations are network operations and those can take really long times, one way or the other.
As for the actuall problem: When inserting or updating large amounts of data, batching is rather important. You want to do enough at once to avoid overhead. But not so much, you end up locking up the table and thus possibly the whole DB for a very long time. Especially if the network connection to the client is not the fastest. I always advice to do bulk inserts on the DBMS side just to avoid this, but it seems unlikely you can do that here.
With blobs every insert should be a seperate job. Do not even try to do bulk blob inserts.

Related

How to prevent SQL Injection in this code?

How can i prevent these code of getting SQL injected? It's a login system that i'm learning. Here's the code!
if (!(string.IsNullOrWhiteSpace(textBox1.Text)) && !(string.IsNullOrWhiteSpace(textBox2.Text)))
{
MySqlConnection mcon = new MySqlConnection("datasource = 127.0.0.1; port = 3306; username = root; password = ; database = rpgmaster;");
mcon.Open();
DataTable table = new DataTable();
MySqlDataAdapter adapter = new MySqlDataAdapter("Select * From users where Username = '" + textBox2.Text + "' and password = '" + textBox1.Text + "'", mcon);
adapter.Fill(table);
if (table.Rows.Count <= 0)
{
MessageBox.Show("Você não está registrado!");
}
else
{
MessageBox.Show("Logado com sucesso! ");
}
mcon.Close();
}
Thanks for the help! Really appreciate it!
If you're learning, you could perhaps move on from this old low level way of doing data access and use something a bit more modern and easy. Dapper is an example of a library that isn't a huge leap above what you already know but makes your life a lot nicer:
using(var conn = new MySqlConnection("conn str here"){
var sql = "SELECT count(*) FROM tblUsers WHERE username = #u AND password = #p";
var prm = new {
u = txtUsername.Text, //give your textboxes better names than textbox2,textbox1!
p = txtPassword.Text.GetHashCode() //do NOT store plain text passwords!
};
bool valid = await conn.QuerySingleAsync<int>(sql, prm) > 0;
if(valid)
... valid login code
else
... invalid login
}
Some notes on this:
dapper is a device that you simply give your sql and parameter values to
the sql holds #parameters names like #u
an anonymous typed object has properties called the same name as the parameter name, with a value, like u = "my username"
use async/await when running queries; dapper makes this easy. Avoid jamming your UI up on queries that take 10 seconds to run
in this case you only need to ask the db to count the matching records, you don't need to download them all to find out if there are any, so we use QuerySingleAsync<int> which queries a single value of type it, and if it's more than 0, the login was valid
never store password in a database in plaintext. Use a one way hashing function like MD5, SHA256 etc, even the lowly string.GetHashCode is better than storing plaintext, particularly because people use the same passwords all the time so anyone breaking into your db (very easy; the password is in the code) will reveal passwords treat people probably use in their banking etc. We can't really be asking, on the one hand, how to prevent a huge security hole like SQL injection, and then on the other hand leave a huge security hole like plaintext passwords ;)
always name your textboxes a better name than the default textboxX - it takes seconds and makes your code understandable. If Microsoft called all their class property names like that, then the entire framework would be full of things like myString.Int1 rather than myString.Length and it would be completely unusable
life is too short to spend it writing AddWithValue statements; use Dapper, Entity Framework, strongly typed datasets.. Some db management technology that eases the burden of writing that code
Where Dapper makes things really nice for you is its ability to turn objects into queries and vice versa; this above is just a basic count example, but suppose you had a User class:
class User
{
string Name { get; set; }
string HashedPassword { get; set; }
int age {get; set; }
}
And you had a table tblUsers that was similar (column names the same as the property names), then you could query like:
User u = new User() { Name = "someuser" };
User t = await conn.QuerySingleAsync<User>("SELECT Name, HashedPassword, Age FROM tblUsers WHERE Name = #Name", u);
We want to look up all the info of the someuser user, so we make a new User with that Name set (we could also use anonymous type, like the previous example) and nothing else, and we pass that as the parameters argument. Dapper will see the query contains #Name, pull the contents of the Name from the u user that we passed in, and run the query. When the results return it will create a User instance for us, fully populated with all the data from the query
To do this old way we'd have to:
have a command,
have a connection,
add parameters and values,
open the connection,
run the sql,
get a reader,
check if the reader had rows,
loop over the reader pulling the first row,
make a new User,
use reader.GetInt/GetString etc to pull the column values out one by one and
finally return the new user
oh and dispose of all the db stuff, close the connection etc
Writing that code is repetitive, and it is really boring. In computing, when we have something repetitive and boring, that we need to do thousands of times through out life (like serializing to json, calling a webservice, designing a windows UI) we find some way to make the computer do the repetitive boring bit; they do it faster and more accurately than we can. This is exactly what Dapper does; it does away with that boring repetitive and reduces it to a single line where you say what you want back, using what query, with what parameters. And it keeps your UI working:
await x.QueryAsync<type>(query, parameters)
Win. Seek out some Dapper tutorials! (I have no affiliation)
Try using parameters please see updated sample of your code below:
if (!(string.IsNullOrWhiteSpace(textBox1.Text)) && !(string.IsNullOrWhiteSpace(textBox2.Text)))
{
using (MySqlConnection mcon = new MySqlConnection("datasource = 127.0.0.1; port = 3306; username = root; password = ; database = rpgmaster;"))
{
mcon.Open();
MySqlCommand cmd = new MySqlCommand("Select * from users where username=?username and password=?password", mcon);
cmd.Parameters.Add(new MySqlParameter("username", textBox2.Text));
cmd.Parameters.Add(new MySqlParameter("password", textBox1.Text));
MySqlDataReader dr = cmd.ExecuteReader();
if (dr.HasRows == true)
{
MessageBox.Show("Você não está registrado!");
}
else
{
MessageBox.Show("Logado com sucesso! ");
}
}
}
Use parameters to pass and check their length, Use stored procedure instead of a query in the code. Use columns instead of * in Select. And please make sure you don't store the plain password in the DB
Use Parameters
using (MySqlConnection mcon = new MySqlConnection(connectionString))
{
string commandText = "SELECT * FROM users WHERE Username = '#tbxText'"
SqlCommand command = new SqlCommand(commandText, mcon);
command.Parameters.AddWithValue("#tbxText", textBox2.Text);
}

SqlBulkCopy succeeds but inserts no records

I am trying to insert a large number of records into a variety of tables. I cannot fit all of the records into memory at once, so instead, I am using an IDataReader implementation to fit some of the data into memory and then dump it into the database with SqlBulkCopy.
The problem I am experiencing is that the second time I try to write to a table, SqlBulkCopy will succeed but fail to actually insert the records. I thought it was a transaction issue at first, but I disabled all transactions on my connection and am still seeing the same problem. I can also independently confirm the size of the tables before and after both inside the code and outside.
Here is a code snippet:
long before = GetCount(tableName);
DataServerConnection conn = GetConnection();
using (var batch = conn.HasTransaction ?
new SqlBulkCopy((SqlConnection)conn.IDbConnection, SqlBulkCopyOptions.Default, (SqlTransaction)conn.Transaction) :
new SqlBulkCopy((SqlConnection)conn.IDbConnection))
{
batch.DestinationTableName = tableName;
batch.WriteToServer(reader);
}
long after = GetCount(tableName);
if ((reader.Count + before) != after)
{
throw new Exception($"Not all records inserted: Before = {before}, After = {after}, Reader Count = {reader.Count}, Expected = {reader.Count + before}");
}
Any ideas what I am missing? GetCount(tableName) is doing a simple
SELECT COUNT(*) FROM [{tableName}]
reader is a basic IDataReader implementation that I have verified works other places on millions of records. GetConnection() is returning a wrapper for the connection which helps to prevent me from having to manage my connections constantly.
I'm not sure how your reader variable is declared so I'll tell you how I do it sometimes (and this doesn't mean it is the best way to do it).
First I declare a tableAdapter based on the dataSet I have:
DataSetTableAdapter.MyTableAdapter dataTable = new DataSetTableAdapter.MyTableAdapter();
Then, I create the connection.
SqlConnection sql = new SqlConnection(...);
And then the BulkCopy variable:
SqlBulkCopy insertData = new SqlBulkCopy(sql);
Once I have this, I start adding rows to the table Adapter like this:
dataTable.AddMyTableRow(...);
When I am finished, I do the bulk insertion:
sql.Open();
insertData.DestinationTableName = "MyTable";
insertData.WriteToServer(dataTable);
sql.Close();
Let me know if this helps you.

Batch insert to SQL Server table from DataTable using ODBC Connection

I have been asked to look at finding the most efficient way to take a DataTable input and write it to a SQL Server table using C#. The snag is that the solution must use ODBC Connections throughout, this rules out sqlBulkCopy. The solution must also work on all SQL Server versions back to SQL Server 2008 R2.
I am thinking that the best approach would be to use batch inserts of 1000 rows at a time using the following SQL syntax:
INSERT INTO dbo.Table1(Field1, Field2)
SELECT Value1, Value2
UNION
SELECT Value1, Value2
I have already written the code the check if a table corresponding to the DataTable input already exists on the SQL Server and to create one if it doesn't.
I have also written the code to create the INSERT statement itself. What I am struggling with is how to dynamically build the SELECT statements from the rows in the data table. How can I access the values in the rows to build my SELECT statement? I think I will also need to check the data type of each column in order to determine whether the values need to be enclosed in single quotes (') or not.
Here is my current code:
public bool CopyDataTable(DataTable sourceTable, OdbcConnection targetConn, string targetTable)
{
OdbcTransaction tran = null;
string[] selectStatement = new string[sourceTable.Rows.Count];
// Check if targetTable exists, create it if it doesn't
if (!TableExists(targetConn, targetTable))
{
bool created = CreateTableFromDataTable(targetConn, sourceTable);
if (!created)
return false;
}
try
{
// Prepare insert statement based on sourceTable
string insertStatement = string.Format("INSERT INTO [dbo].[{0}] (", targetTable);
foreach (DataColumn dataColumn in sourceTable.Columns)
{
insertStatement += dataColumn + ",";
}
insertStatement += insertStatement.TrimEnd(',') + ") ";
// Open connection to target db
using (targetConn)
{
if (targetConn.State != ConnectionState.Open)
targetConn.Open();
tran = targetConn.BeginTransaction();
for (int i = 0; i < sourceTable.Rows.Count; i++)
{
DataRow row = sourceTable.Rows[i];
// Need to iterate through columns in row, getting values and data types and building a SELECT statement
selectStatement[i] = "SELECT ";
}
insertStatement += string.Join(" UNION ", selectStatement);
using (OdbcCommand cmd = new OdbcCommand(insertStatement, targetConn, tran))
{
cmd.ExecuteNonQuery();
}
tran.Commit();
return true;
}
}
catch
{
tran.Rollback();
return false;
}
}
Any advice would be much appreciated. Also if there is a simpler approach than the one I am suggesting then any details of that would be great.
Ok since we cannot use stored procedures or Bulk Copy ; when I modelled the various approaches a couple of years ago, the key determinant to performance was the number of calls to the server. So batching a set of MERGE or INSERT statements into a single call separated by semi-colons was found to be the fastest method. I ended up batching my SQL statements. I think the max size of a SQL statement was 32k so I chopped up my batch into units of that size.
(Note - use StringBuilder instead of concatenating strings manually - it has a beneficial effect on performance)
Psuedo-code
string sqlStatement = "INSERT INTO Tab1 VALUES {0},{1},{2}";
StringBuilder sqlBatch = new StringBuilder();
foreach(DataRow row in myDataTable)
{
sqlBatch.AppendLine(string.Format(sqlStatement, row["Field1"], row["Field2"], row["Field3"]));
sqlBatch.Append(";");
}
myOdbcConnection.ExecuteSql(sqlBatch.ToString());
You need to deal with batch size complications, and formatting of the correct field data types in the string-replace step, but otherwise this will be the best performance.
Marked solution of PhillipH is open for several mistakes and SQL injection.
Normally you should build a DbCommand with parameters and execute this instead of executing a self build SQL statement.
The CommandText must be "INSERT INTO Tab1 VALUES ?,?,?" for ODBC and OLEDB, SqlClient needs named parameters ("#<Name>").
Parameters should be added with the dimensions of underlaying column.

Trying to implement grouping in IEnumerable to stream from Database

Currently the application I'm working with uses strongly typed DataSets to work with data from the DB. We have a table called COM_ControlIn that represents a "file" and several other tables have a relationship with the control table. The one I need to stream from is called COM_GenericTransactionItems. There is a column in this table called COMControlIn_UID which links it up to the control table as the name suggests.
We have several methods to fetch data from this table, such as one that finds all records for a given COMControlIn_UID, but the problem with all of these is that they fetch all records at once, which is becoming a problem now that the sheer amount of data is causing us to hit .NET's memory limit. All of our existing code uses strongly typed datasets built from XSDs generated by Visual Studio from the database schema.
My idea was to use IEnumerable to "stream" batches of records from the database instead of fetching everything at once, while still keeping the strongly typed datasets we've used previously to keep it compatible without major changes. The code I've written looks more or less like this:
COM_GenericTransactionItemsDS com_GenericTransactionItemsDS = new COM_GenericTransactionItemsDS();
long lastUID = 0;
using (SqlConnection sqlConnection = new SqlConnection("...")
{
sqlConnection.Open();
SqlCommand sqlCommand = new SqlCommand("SELECT MAX(UID) FROM COM_GenericTransactionItems WHERE COMControlIn_UID = " + p_COMControlIn_UID, sqlConnection);
//because apparently I'm not allowed to straight cast...
long maxUID = Convert.ToInt64(sqlCommand.ExecuteScalar());
while (lastUID < maxUID)
{
com_GenericTransactionItemsDS.Clear();
using (SqlDataAdapter sqlDataAdapter = new SqlDataAdapter())
{
//Build Select
string strSQL = "SELECT TOP(" + fetchAmount + ") " + SQL_Columns + " FROM COM_GenericTransactionItems " +
"WHERE COMControlIn_UID = " + p_COMControlIn_UID.ToString() + " AND UID > " + lastUID + " ORDER BY UID";
//Get Data
sqlDataAdapter.SelectCommand = new SqlCommand(strSQL, sqlConnection);
sqlDataAdapter.SelectCommand.CommandTimeout = Convert.ToInt32(context.strContext[(int)eCCE_Context._COMMAND_TIMEOUT]);
sqlDataAdapter.Fill(com_GenericTransactionItemsDS, "COM_GenericTransactionItems");
lastUID = com_GenericTransactionItemsDS.COM_GenericTransactionItems.Max(r => r.UID);
}
yield return com_GenericTransactionItemsDS;
}
}
It works extremely well for fetching data and has dropped our memory usage significantly, but I have run into a problem a little further down the line.
I need to group items within this table by a specific column (a date), but the notion of this conflicts with the whole batching approach, because you need to know what your entire dataset looks like to do the grouping.
I can't do the grouping in SQL because I need the data in a sort of key-value pair like Linq used to give me before I switched to using this method (unless there is a way for me to do this in SQL).
When I try using SelectMany to flatten all of my rows into one enumerable I get RowNotInTableException whenever I try to access any of them. I don't really know what else to try.
For reference, this is the Linq query I use to do the grouping:
var dateGroups = from row in p_COM_GenericTransactionItemsDS.SelectMany(c => c.COM_GenericTransactionItems) group row by (DateTime)row[tableDefinitions.CaptureDate] into groups select groups;
I think the problem lies with the way I'm returning data from my streaming method, but I don't know how else to do it. Ideally I'd like to extract all the rows out of our data tables into an IEnumerable and just iterate through that, but DataRows don't keep the table's schema (I've read the schema is kept in the DataTable they're related to), so once you remove them from the dataset they are essentially useless.
I've solved my problem. I changed my streaming method to loop through the items it receives in a batch, make a copy of them and return them one by one, like so:
foreach (COM_GenericTransactionItemsDS.COM_GenericTransactionItemsRow row in com_GenericTransactionItemsDS.COM_GenericTransactionItems.Rows)
{
lastUID = row.UID;
COM_GenericTransactionItemsDS.COM_GenericTransactionItemsRow newRow = com_GenericTransactionItemsDS.COM_GenericTransactionItems.NewCOM_GenericTransactionItemsRow();
newRow.ItemArray = row.ItemArray;
yield return newRow;
}

SQL Insert one row or multiple rows data?

I am working on a console application to insert data to a MS SQL Server 2005 database. I have a list of objects to be inserted. Here I use Employee class as example:
List<Employee> employees;
What I can do is to insert one object at time like this:
foreach (Employee item in employees)
{
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
}
Or I can build a balk insert query like this:
string sql = #"INSERT INTO MyTable (id, name, salary) ";
int count = employees.Count;
int index = 0;
foreach (Employee item in employees)
{
sql = sql + string.format(
"SELECT {0}, '{1}', {2} ",
item.ID, item.Name, item.Salary);
if ( index != (count-1) )
sql = sql + " UNION ALL ";
index++
}
cmd.CommandType = sql;
cmd.ExecuteNonQuery();
I guess the later case is going to insert rows of data at once. However, if I have
several ks of data, is there any limit for SQL query string?
I am not sure if one insert with multiple rows is better than one insert with one row of data, in terms of performance?
Any suggestions to do it in a better way?
Actually, the way you have it written, your first option will be faster.
Your second example has a problem in it. You are doing sql = + sql + etc. This is going to cause a new string object to be created for each iteration of the loop. (Check out the StringBuilder class). Technically, you are going to be creating a new string object in the first instance too, but the difference is that it doesn't have to copy all the information from the previous string option over.
The way you have it set up, SQL Server is going to have to potentially evaluate a massive query when you finally send it which is definitely going to take some time to figure out what it is supposed to do. I should state, this is dependent on how large the number of inserts you need to do. If n is small, you are probably going to be ok, but as it grows your problem will only get worse.
Bulk inserts are faster than individual ones due to how SQL server handles batch transactions. If you are going to insert data from C# you should take the first approach and wrap say every 500 inserts into a transaction and commit it, then do the next 500 and so on. This also has the advantage that if a batch fails, you can trap those and figure out what went wrong and re-insert just those. There are other ways to do it, but that would definately be an improvement over the two examples provided.
var iCounter = 0;
foreach (Employee item in employees)
{
if (iCounter == 0)
{
cmd.BeginTransaction;
}
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
iCounter ++;
if(iCounter >= 500)
{
cmd.CommitTransaction;
iCounter = 0;
}
}
if(iCounter > 0)
cmd.CommitTransaction;
In MS SQL Server 2008 you can create .Net table-UDT that will contain your table
CREATE TYPE MyUdt AS TABLE (Id int, Name nvarchar(50), salary int)
then, you can use this UDT in your stored procedures and your с#-code to batch-inserts.
SP:
CREATE PROCEDURE uspInsert
(#MyTvp AS MyTable READONLY)
AS
INSERT INTO [MyTable]
SELECT * FROM #MyTvp
C# (imagine that records you need to insert already contained in Table "MyTable" of DataSet ds):
using(conn)
{
SqlCommand cmd = new SqlCommand("uspInsert", conn);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter myParam = cmd.Parameters.AddWithValue
("#MyTvp", ds.Tables["MyTable"]);
myParam.SqlDbType = SqlDbType.Structured;
myParam.TypeName = "dbo.MyUdt";
// Execute the stored procedure
cmd.ExecuteNonQuery();
}
So, this is the solution.
Finally I want to prevent you from using code like yours (building the strings and then execute this string), because this way of executing may be used for SQL-Injections.
look at this thread,
I've answered there about table valued parameter.
Bulk-copy is usually faster than doing inserts on your own.
If you still want to do it in one of your suggested ways you should make it so that you can easily change the size of the queries you send to the server. That way you can optimize for speed in your production environment later on. Query times may v ary alot depending on the query size.
The batch size for a SQL Server query is listed at being 65,536 * the network packet size. The network packet size is by default 4kbs but can be changed. Check out the Maximum capacity article for SQL 2008 to get the scope. SQL 2005 also appears to have the same limit.

Categories

Resources