Searching SQL Server with LIKE Operator

Searching SQL Server with LIKE Operator - c#

I have a problem when I try to read rows from SQL Server 2005 from code in C#
The idea:
In my database (SQL Server 2005 Express) there is a table with a column (of datatype ntext) containing HTML code.
In my C# application user can enter a sentence (HTML code) and search the rows with contains this sentence.
The query generated from my app is:
USE test
SELECT
al.aal_Id As ID,
al.aal_Description As Opis,
au.au_Title As Tytul_szablonu,
au.au_Note As Nazwa_szablonu
FROM dbo.au_Allegro al
LEFT OUTER JOIN dbo.au__Auction au ON (al.aal_AuctionId = au.au_Id)
WHERE
au.au_Type = 11
AND al.aal_Description COLLATE SQL_Latin1_General_CP1_CS_AS LIKE '%%' ESCAPE '\'
In my App I'm converting special characters (e.g. ',) and adding escape character.
User tries to search for very long sentence (about 7000+ chars), when he tries to do this the sqlserver.exe process consumes all of his RAM memory and search time is about 30+ minutes (he has about 1000+ rows in this table).
The query returns 0 rows.
When he tries to run (this same) query in SQL Server Management Studio the database shows results in few seconds (with rows).
In my app I use SqlDataAdapter:
System.Data.DataTable dt = new System.Data.DataTable();
System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand();
cmd.CommandTimeout = 0;
cmd.Connection = conn;
System.Data.SqlClient.SqlDataAdapter da = new System.Data.SqlClient.SqlDataAdapter(kwerenda, conn);
try
{
da.Fill(dt);
}
I tried SqlDataReader:
dr = cmd.ExecuteReader();
while (dr.Read())
{
string id = dr["ID"].ToString();
string opis = dr["Opis"].ToString();
string tytul = dr["Tytul_szablonu"].ToString();
string nazwa = dr["Nazwa_szablonu"].ToString();
dt.Rows.Add(id, opis, tytul, nazwa);
}
When I tried to simulate this in my test database I don't have any problems with search (this same) sentences.
Have you got any tips for me ?
I can't do any changes in user datatable, i can't go to him and check what happens.

Is the SQL command executing a stored procedure? If so you might be getting different query plans, which may explain the timing difference between the apps. Your ADO.Net call might be affected by something known as parameter sniffing, which can cause radically different query execution times.
There are a couple of things you can do to avoid this problem and yield consistent results.
Convert parameters to local variables inside of the stored procedure.
Disable the feature on the SQL server altogether.
Also your syntax looks suspect as John pointed out. It would be better to use a NVARCHAR(MAX) datatype for that column if possible NTEXT should be avoided as its been deprecated.
A better alternative to doing like searches on a non-indexed column like this is to utilize the SQL's Full Text Search which is optimized for these types of queries.
http://msdn.microsoft.com/en-us/library/ms142571.aspx
http://www.developer.com/article.php/3446891

A couple of things you might want to do.
First, don't use nText. SQL 2005 has a datatype called nvarchar(max). It's MUCH better for storing large amounts of text. Further, ntext was deprecated so save yourself some trouble and convert it now. See this link on how to successfully do this.
Second, the query you posted is unusual. You have a left outer join, but you have a where clause on the outer joined table. Because of the where clause it's being converted (hopefully) into an inner join. You should just write it that way OR move the au.au_type = 11 to be part of the join construct. I doubt you want the latter.
Third, when the client runs the query the first time through your app it is generating a query plan based on those parameters. Running the exact same query shortly thereafter in Management Studio is going to reuse that plan and cached data. Therefore the second pass will be fast so no surprise there.
Fourth, I don't think you posted the actual query that was run. I suspect there is some data in the parameter you are comparing which either isn't escaping properly OR is using one of the reserved characters such as '[', ']', ^, etc.

Related

Same query with the same query plan takes ~10x longer when executed from ADO.NET vs. SMSS

My query is fairly complex, but I have simplified it to figure out this problem and now it is a simple JOIN that I'm running on a SQL Server 2014 database. The query is:
SELECT * FROM SportsCars as sc INNER JOIN Cars AS c ON c.CarID = sc.CarID WHERE c.Type = 1
When I run this query from SMSS and watch it in SQL Profiler, it takes around 350ms to execute. When I run the same query inside my application using Entity Framework or ADO.NET (I've tried both). It takes 4500ms to execute.
ADO.NET Code:
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
var cmdA = new SqlCommand("SET ARITHABORT ON", connection);
cmdA.ExecuteNonQuery();
var query = "SELECT * FROM SportsCars as sc INNER JOIN Cars AS c ON c.CarID = sc.CarID WHERE c.Type = 1";
var cmd = new SqlCommand(query, connection);
cmd.ExecuteNonQuery()
}
I've done an extensive Google search and found this awesome article and several StackOverflow questions (here and here). In order to make the session parameters identical for both queries I call SET ARITHABORT ON in ADO.NET and it makes no difference. This is a straight SQL query, so there is not a parameter sniffing problem. I've simplified the query and the indexes down to their most basic form for this test. There is nothing else running on the server and there is nothing else accessing the database during the test. There are no computed columns in the Cars or SportsCars table, just INTs and VARCHARs.
The SportsCars table has about 170k records and 4 columns, and the Cars table has about 1.2M records and 7 columns. The resulting data set (SportsCars of Type=1) has about 2600 records and 11 columns. I have a single non-clustered index on the Cars table, on the [Type] column that includes all the columns of the cars table. And both tables have a clustered index on the CarID column. No other indexes exist on either table. I'm running as the same database user in both cases.
When I view the data in SQL Profiler, I see that both queries are using the exact same, very simple query plan. In SQL Profiler, I'm using the Performance Event Class and the ShowPlan XML Statistics Profile, which I believe to be the proper event to monitor and capture the actual execution plan. The # of reads is the same for both queries (2596).
How can two exact same queries with the exact same query plan take 10x longer in ADO.NET vs. SMSS?

Figured it out:
Because I'm using Entity Framework, the connection string in my application has MultipleActiveResultSets=True. When I remove this from the connection string, the queries have the same performance in ADO.NET and SSMS.
Apparently there is an issue with this setting causing queries to respond slowly when connected to SQL Server via WAN. I found this link and this comment:
MARS uses "firehose mode" to retrieve data. Firehose mode means that
the server will produce data as fast as possible. This also means that
your client application must receive inbound data at the same speed as
it comes in. If it doesn't the data storage buffers on the server will
fill up and the processing will stop until those buffers empty.
So what? You may ask... But as long as the processing is stopped the
resources on the SQL server are in use and are tied up. This includes
the worker thread, schema and data locks, memory, etc. So it is
crucial that your client application consumes the inbound results as
quickly as they arrive.
I have to use this setting with Entity Framework otherwise lazy loading will generate exceptions. So I'm going to have to figure out some other workaround. But at least I understand the issue now.

How can two exact same queries with the exact same query plan take 10x longer in ADO.NET vs. SMSS?
First we need to be clear about what is considered "same" with regards to queries and query plans. Assuming that the query at the very top of the question is a copy-and-paste, then it is not the same query as the one being submitted via ADO.NET. For two queries to be the same, they need to be byte-by-byte the same, which includes all white-space, capitalization, punctuation, comments, etc.
The two queries shown are definitely very similar. And they might even share the same execution plan. But how was "same"ness determined for those? Was the XML the same in both cases? Or just what was shown graphically in SSMS when viewing the plans? If they were determined to be the same based on their graphical representation then that is sometimes misleading. The XML itself needs to be checked. Even if two query plans have the same query hash, there are still (sometimes) parts of a query plan that are variable and changes do not change the plan hash. One example is the evaluation of expressions. Sometimes they are calculated and their result is embedded into the plan as a constant. Sometimes they are calculated at the start of each execution and stored and reused within that particular execution, but not for any subsequent executions.
One difference between SSMS and ADO.NET is the default session properties for each. I thought I had seen a chart years ago showing the defaults for ADO / OLEDB / SQLNCLI but can't find it out. Either way, it doesn't need to be guess work as it can be discovered using the SESSIONPROPERTY function. Just run this query in the C# code instead of your current SELECT, and inspect the results in debug or print them out or whatever. Either way, run something like this:
SELECT SESSIONPROPERTY('ANSI_NULLS') AS [AnsiNulls],
SESSIONPROPERTY('ANSI_PADDING') AS [AnsiPadding],
SESSIONPROPERTY('CONCAT_NULL_YIELDS_NULL') AS [ConcatNullYieldsNull],
...;
Make sure to get all of the setting noted in the linked MSDN page. Now, in SSMS, go to the "Query" menu, select "Query Options...", and go to "Execution" | "ANSI". The settings coming back from the C# code need to match the ones showing in SSMS. Anything set different requires adding something like this to the beginning of your ADO.NET query string:
SET ANSI_NULLS ON;
{rest of query}
Now, if you want to eliminate the DataTable loading from being a possible suspect, just replace that line, just replace:
var cars = new DataTable();
cars.Load(reader);
with:
while(reader.Read());
And lastly, why not just put the query into a Stored Procedure? The session settings (i.e. ANSI_NULLS, etc) that typically matter the most are stored with the proc definition so they should work the same whether you EXEC from SSMS or from ADO.NET (again, we aren't dealing with any parameters here).

Calling SQL select statement from C# thousands of times and is very time consuming. Is there a better way?

I get a list of ID's and amounts from a excel file (thousands of id's and corresponding amounts). I then need to check the database to see if each ID exists and if it does check to make sure the amount in the DB is greater or equal to that of the amount from the excel file.
Problem is running this select statement upwards of 6000 times and return the values I need takes a long time. Even at a 1/2 of a second a piece it will take about an hour to do all the selects. (I normally dont get more than 5 results max back)
Is there a faster way to do this?
Is it possible to somehow pass all the ID's at once and just make 1 call and get the massive collection?
I have tried using SqlDataReaders and SqlDataAdapters but they seem to be about the same (too long either way)
General idea of how this works below
for (int i = 0; i < ID.Count; i++)
{
SqlCommand cmd = new SqlCommand("select Amount, Client, Pallet from table where ID = #ID and Amount > 0;", sqlCon);
cmd.Parameters.Add("#ID", SqlDbType.VarChar).Value = ID[i];
SqlDataAdapter da = new SqlDataAdapter(cmd);
da.Fill(dataTable);
da.Dispose();
}

Instead of a long in list (difficult to parameterise and has a number of other inefficiencies regarding execution plans: compilation time, plan reuse, and the plans themselves) you can pass all the values in at once via a table valued parameter.
See arrays and lists in SQL Server for more details.
Generally I make sure to give the table type a primary key and use option (recompile) to get the most appropriate execution plans.

Combine all the IDs together into a single large IN clause, so it reads like:
select Amount, Client, Pallet from table where ID in (1,3,5,7,9,11) and Amount > 0;

"I have tried using SqlDataReaders and SqlDataAdapters"
It sounds like you might be open to other APIs. Using Linq2SQL or Linq2Entities:
var someListIds = new List<int> { 1,5,6,7 }; //imagine you load this from where ever
db.MyTable.Where( mt => someListIds.Contains(mt.ID) );
This is safe in terms of avoiding potential SQL injection vulnerabilities and will generate a "in" clause. Note however the size of the someListIds can be so large that the SQL query generated exceeds limits of query length, but the same is true of any other technique involving the IN clause. You can easily workaround that by partitioning lists into large chunks, and still be tremendously better than a query per ID.

Use Table-Valued Parameters
With them you can pass a c# datatable with your values into a stored procedure as a resultset/table which you can join to and do a simple:
SELECT *
FROM YourTable
WHERE NOT EXISTS (SELECT * FORM InputResultSet WHERE YourConditions)

Use the in operator. Your problem is very common and it has a name: N+1 performance problem
Where are you getting the IDs from? If it is from another query, then consider grouping them into one.

Rather than performing a separate query for every single ID that you have, execute one query to get the amount of every single ID that you want to check (or if you have too many IDs to put in one query, then batch them into batches of a few thousand).

Import the data directly to SQL Server. Use stored procedure to output the data you need.
If you must consume it in the app tier... use xml datatype to pass into a stored procedure.

You can import the data from the excel file into SQL server as a table (using the import data wizard). Then you can perform a single query in SQL server where you join this table to your lookup table, joining on the ID field. There's a few more steps to this process, but it's a lot neater than trying to concatenate all the IDs into a much longer query.
I'm assuming a certain amount of access privileges to the server here, but this is what I'd do given the access I normally have. I'm also assuming this is a one off task. If not, the import of the data to SQL server can be done programmatically as well

IN clause has limits, so if you go with that approach, make sure a batch size is used to process X amount of Ids at a time, otherwise you will hit another issue.
A #Robertharvey has noted, if there are not a lot of IDs and there are no transactions occurring, then just pull all the Ids at once into memory into a dictionary like object and process them there. Six thousand values is not alot and a single select could return all those back within a few seconds.
Just remember that if another process is updating the data, your local cached version may be stale.

There is another way to handle this, Making XML of IDs and pass it to procedure. Here is code for procedure.
IF OBJECT_ID('GetDataFromDatabase') IS NOT NULL
BEGIN
DROP PROCEDURE GetDataFromDatabase
END
GO
--Definition
CREATE PROCEDURE GetDataFromDatabase
#xmlData XML
AS
BEGIN
DECLARE #DocHandle INT
DECLARE #idList Table (id INT)
EXEC SP_XML_PREPAREDOCUMENT #DocHandle OUTPUT, #xmlData;
INSERT INTO #idList (id) SELECT x.id FROM OPENXML(#DocHandle, '//data', 2) WITH ([id] INT) x
EXEC SP_XML_removeDOCUMENT #DocHandle ;
--SELECT * FROM #idList
SELECT t.Amount, t.Client, t.Pallet FROM yourTable t INNER JOIN #idList x ON t.id = x.id and t.Amount > 0;
END
GO
--Uses
EXEC GetDataFromDatabase #xmlData = '<root><data><id>1</id></data><data><id>2</id></data></root>'
You can put any logic in procedure. You can pass id, amount also via XML. You can pass huge list of ids via XML.

SqlDataAdapter objects too heavy for that.
Firstly, using stored procedures, it will be faster.
Secondly, use the group operation, for this pass as a parameter to a list of identifiers on the side of the database, run a query on these parameters, and return the processed result.
It will quickly and efficiently, as all data processing logic is on the side of the database server

You can select the whole resultset (or join multiple 'limited' result sets) and save it all to DataTable Then you can do selects and updates (if needed) directly on datatable. Then plug new data back... Not super efficient memory wise, but often is very good (and only) solution when working in bulk and need it to be very fast.
So if you have thousands of records, it might take couple of minutes to populate all records into the DataTable
then you can search your table like this:
string findMatch = "id = value";
DataRow[] rowsFound = dataTable.Select(findMatch);
Then just loop foreach (DataRow dr in rowsFound)

How do I perform a checksum of a MySQL table using C#?

I would like to check if my MySQL table has been changed, I thought doing a checksum on it might be a good solution. I'm using the official MySQL .NET connector.
Normally I use the MySqlDataReader to get data from my select query, however I'm not sure how it would work on the checksum query. This is what I have so far.
string query = "checksum table `" + name + "`";
string result = "";
if (isConnected())
{
MySqlCommand cmd = new MySqlCommand(query, connection);
cmd.ExecuteNonQuery();
}
I found the checksum query here.
Any help is appreciated!

You have the right query, however you should understand the possible performance implications.
If the table is MyISAM (which it probably should not be if you are using a more recent version of MySQL) and you have the CHECKSUM=1 option defined for the table, then you can do a QUICK checksum, which should happen fast.
If you have a INNODB table, you are going to have to perform a full table scan to execute this command. This is probably not viable from a performance standpoint on an actively used table of reasonably large size.
I would suggest looking at the MySQL documentation for your options here.
http://dev.mysql.com/doc/refman/5.5/en/checksum-table.html
Another perhaps simple option would be to place a trigger on the table which update a timestamp in another table each time a record is inserted/updated/deleted. You can then just compare this timestamp against a given date to know if a change has been made in that time period.
Similarly, if your table doesn't already have an "On UPDATE CURRENT_TIMESTAMP" timestamp field, you can simply add one, place an index on it and query for the max value from that field. This obviously adds some data size and index size overhead to the table which might not be desirable. This obviously also wouldn't capture deletes (though you could easily check this by getting a count on your primary index field)

table name sql injection

I am working with C#. I need to write a select inline query.
The table name should be taken from config. I cannot write a stored procedure.
SqlCommand myCommand= new SqlCommand();
myCommand.CommandText = "Select * from " + tableName;
myCommand.CommandType = CommandType.Text;
myCommand.Connection = connString;
How to avoid sql injection ?

Just create a query with a real param and check for the existence of the tablename - somthing like:
SELECT COUNT(*) FROM SYS.TABLES WHERE NAME = #pYOURTABLENAME
IF that returns 1 then you know that the table exists and thus can use it in the SELECT you showed in the question...
However I strongly recommend to try anything to get rid of the need for any code prone to SQL injection!

I would ensure table name contains only these characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[]. -_0123456789
E.g.,
Regex regex = new Regex(#"^[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\[\]. -_0123456789]{1,128}$");
if (!regex.IsMatch(tableName)) throw new ApplicationException("Invalid table name");
To do a more comprehensive job including non-English languages see this reference on what a valid table names:
http://msdn.microsoft.com/en-us/library/ms175874.aspx

You need to verify that tableName is appropriate. After some sanity checking (making sure it has no spaces, or other disallowed characters for table names, etc), I would then first query the database for the names of all tables, and programmatically verify that it is one of those table names. Then proceed to run the query you show.

I'd look at moving the SQL to a stored proc and review this article by Erland Sommarskog as it has some great ideas for using dynamic SQL within stored procs. I'd certainly look into it. It discusses a lot of the issues around SQL injection and possible alternatives to dynamic SQL.
He also has another great article on ways to use arrays in stored procs. I know you didn't ask for that, but I often refer to these two articles as I think they are quite insightful and provide you with some useful ideas with regards to writing your procedures.
In addition to some of the suggestions linked above, I still have some basic parameter sanitisation mechanisms that I use if I am ever using dynamic SQL. An example of this is as follows;
IF LEN(#TableName) < 5 OR LEN(#TableDisplayName) < 5
BEGIN
RAISERROR('Please ensure table name and display name are at least 5 characters long', 16, 1)
END
IF NOT (#TableName not like '%[^A-Z]%')
BEGIN
RAISERROR('The TableName can only contain letters', 16, 1)
END
IF NOT (#TableDisplayName not like '%[^0-9A-Z ]%')
BEGIN
RAISERROR('The TableDisplayName can only contain letters, numbers or spaces', 16, 1)
END
This combined with using parameters within your dynamic sql and then executing using sp_executesql certainly help to minimise the possibility of a SQL injection attack.

Proper Way to Format SQL Query Within C# Application

I have a C# console application that makes a bunch of queries to a database server.
I frequently need to modify the SQL and would like to simply copy/paste the SQL from my SQL Editor into my C# source without having to reformat the SQL every time.
Currently, the SQL is all on one line... like below:
OleDbDataAdapter da_ssm_servers = new OleDbDataAdapter(#"SELECT * FROM mytable ORDER BY Server;", connSSM);
The SQL is much longer than above with a lot of table JOINS, etc.
I would like to keep the formatting, but don't really want to have to go back and add quotes around each line, etc.
If anyone has any recommendations and examples, it would be appreciated.

I do it like this:
string sql = #"
SELECT *
FROM mytable
ORDER BY Server";
OleDbDataAdapter da_ssm_servers = new OleDbDataAdapter(sql, connSSM);

My recommendation would be to stay away from ad hoc queries like you are using and utilize stored procedures. That would decouple your design, as well as limit your calls to a stored procedure name and possibly parameters.
But if you must use ad hoc queries then prefix with # and you will be able to span multiple lines without having to surround each line in quotes.

As long as you use the # syntax then you can have the SQL span multiple lines just fine.
For example
string sql = #"select
colA,
colB,
colC
from
tableX
inner join tableY
on tableX.colA = tableY.colA
where
colB > 20;"

It seems that your code should work fine. When you start your string with # then you're using a verbatim string. Line breaks and formatting are preserved and you don't need to enclose each line in quotes. You just need an ending quote.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.