Concerns about SQL Server 2008 Full Text Search

Concerns about SQL Server 2008 Full Text Search - c#

I have built a T-SQL query like this:
DECLARE #search nvarchar(1000) = 'FORMSOF(INFLECTIONAL,hills) AND FORMSOF(INFLECTIONAL,print) AND FORMSOF(INFLECTIONAL,emergency)'
SELECT * FROM Tickets
WHERE ID IN (
-- unioned subqueries using CONTAINSTABLE
...
)
The GUI for this search will be an aspx page with a single textbox where the user can search.
I plan to somehow construct the search term to be like the example above (#search).
I have some concerns, though:
Is the example search term above the best or only way to include the inflections of all words in the search?
Should I separate the words and construct the search term in C# or T-SQL. I tend to lean toward C# for decisions/looping/construction, but I want your opinion.
I hate building SQL dynamically because of the risk of injection. How can I guard against this?
Should I use FREETEXTTABLE instead? Is there a way to make FREETEXT look for ALL words instead of ANY?
In general, how else would you do this?

I recently used Full-Text Search, so I'll try to answer some of your questions.
• "I hate building sql dynamically because of the risk of injection. How can I guard against this?"
I used a sanitize method like this:
static string SanitizeInput(string searchPhrase)
{
if (searchPhrase.Length > 200)
searchPhrase = searchPhrase.Substring(0, 200);
searchPhrase = searchPhrase.Replace(";", " ");
searchPhrase = searchPhrase.Replace("'", " ");
searchPhrase = searchPhrase.Replace("--", " ");
searchPhrase = searchPhrase.Replace("/*", " ");
searchPhrase = searchPhrase.Replace("*/", " ");
searchPhrase = searchPhrase.Replace("xp_", " ");
return searchPhrase;
}
• Should I use FREETEXTTABLE instead? Is there a way to make FREETEXT look for ALL words instead of ANY?
I did use FREETEXTTABLE, but I needed any of the words. As much as I've read about it (and I've read quite a bit), you have to use CONTAINSTABLE to search for ALL words, or different combinations. FREETEXTTABLE seems to be the lighter solution, but not the one to pick when you want deeper customizations.

Dan, I like your SanitizeInput method. I refactored it to make it more compact and enhance performance a little.
static string SanitizeInput(string searchPhrase, int maxLength)
{
Regex r = new Regex(#";|'|--|xp_|/\*|\*/", RegexOptions.Compiled);
return r.Replace(searchPhrase.Substring(0, searchPhrase.Length > maxLength ? maxLength : searchPhrase.Length), " ");
}
static string SanitizeInput(string searchPhrase)
{
const int MAX_SEARCH_PHRASE_LENGTH = 200;
return SanitizeInput(searchPhrase, MAX_SEARCH_PHRASE_LENGTH);
}
I agree that FreeTextTable is too lightweight of a solution.

In your example, you have the #search variable already defined. As a rule of thumb, you shouldn't include dynamically concatenated text into raw SQL, due to the risk of injection. However, you can of course set the value of #search in the calling command object from your application. This completely negates the risk of injection attacks.
I would recommend construction of the search term in C#; passing the final search term in as a parameter like already mentioned.
As far as I recall, FREETEXTTABLE uses word breakers to completely decompose the search terms into their individual components. However, the FREETEXTTABLE operator automatically decomposes words into inflectional equivalents also, so you won't have to construct a complex CONTAINSTABLE operator if you decide to use it.
You could INNER JOIN the results of multiple FREETEXTTABLE queries to produce an equivalent AND result.

All of our searches are on columns in the database that have predefined valid characters.
Our search algorithm incorporates this with a regex that only allows these predefined characters. Because of this escaping in the search string is not needed. Our regex weeds out any injection attempts in the web code (asp & aspx). For standard comments from the users, we use escaping that changes all characters that may be used for harm in SQL, ASP, ASPX, & Javascript.
The TransStar site http://latranstar.tann.com/ is using an extended form of Soundex to search for street names, addresses and cities anywhere in Southern California. The Soundex by itself eliminates any need for anti-injection code since it operates only on alpha characters.

Related

SQL Like Operator (single quotation) [duplicate]

The MySQL documentation says that it should be \'. However, both scite and mysql shows that '' works. I saw that and it works. What should I do?

The MySQL documentation you cite actually says a little bit more than you mention. It also says,
A “'” inside a string quoted with “'” may be written as “''”.
(Also, you linked to the MySQL 5.0 version of Table 8.1. Special Character Escape Sequences, and the current version is 5.6 — but the current Table 8.1. Special Character Escape Sequences looks pretty similar.)
I think the Postgres note on the backslash_quote (string) parameter is informative:
This controls whether a quote mark can be represented by \' in a string literal. The preferred, SQL-standard way to represent a quote mark is by doubling it ('') but PostgreSQL has historically also accepted \'. However, use of \' creates security risks...
That says to me that using a doubled single-quote character is a better overall and long-term choice than using a backslash to escape the single-quote.
Now if you also want to add choice of language, choice of SQL database and its non-standard quirks, and choice of query framework to the equation, then you might end up with a different choice. You don't give much information about your constraints.

Standard SQL uses doubled-up quotes; MySQL has to accept that to be reasonably compliant.
'He said, "Don''t!"'

What I believe user2087510 meant was:
name = 'something'
name = name.replace("'", "\\'")
I have also used this with success.

There are three ways I am aware of. The first not being the prettiest and the second being the common way in most programming languages:
Use another single quote: 'I mustn''t sin!'
Use the escape character \ before the single quote': 'I mustn\'t sin!'
Use double quotes to enclose string instead of single quotes: "I mustn't sin!"

just write '' in place of ' i mean two times '

Here's an example:
SELECT * FROM pubs WHERE name LIKE "%John's%"
Just use double quotes to enclose the single quote.
If you insist in using single quotes (and the need to escape the character):
SELECT * FROM pubs WHERE name LIKE '%John\'s%'

Possibly off-topic, but maybe you came here looking for a way to sanitise text input from an HTML form, so that when a user inputs the apostrophe character, it doesn't throw an error when you try to write the text to an SQL-based table in a DB. There are a couple of ways to do this, and you might want to read about SQL injection too.
Here's an example of using prepared statements and bound parameters in PHP:
$input_str = "Here's a string with some apostrophes (')";
// sanitise it before writing to the DB (assumes PDO)
$sql = "INSERT INTO `table` (`note`) VALUES (:note)";
try {
$stmt = $dbh->prepare($sql);
$stmt->bindParam(':note', $input_str, PDO::PARAM_STR);
$stmt->execute();
} catch (PDOException $e) {
return $dbh->errorInfo();
}
return "success";
In the special case where you may want to store your apostrophes using their HTML entity references, PHP has the htmlspecialchars() function which will convert them to '. As the comments indicate, this should not be used as a substitute for proper sanitisation, as per the example given.

Replace the string
value = value.replace(/'/g, "\\'");
where value is your string which is going to store in your Database.
Further,
NPM package for this, you can have look into it
https://www.npmjs.com/package/mysql-apostrophe

I think if you have any data point with apostrophe you can add one apostrophe before the apostrophe
eg. 'This is John's place'
Here MYSQL assumes two sentence 'This is John' 's place'
You can put 'This is John''s place'. I think it should work that way.

In PHP I like using mysqli_real_escape_string() which escapes special characters in a string for use in an SQL statement.
see https://www.php.net/manual/en/mysqli.real-escape-string.php

asp.net c# allowing users to search string using multiple terms

I am trying to add a search feature to my application which will allow someone to enter several words and search for those in my data.
Doing single words and phrases is simple:
if (x.Title.ToUpper().Contains(tbSearch.Text.ToUpper()) || x.Description.ToUpper().Contains(tbSearch.Text.ToUpper()))
BUT how do I work out if someone entered a search for "red car" and the title was "the car that is red"? I know I could split on SPACE and then search for each term but this seems over complicated and I would also need to strip out non word characters.
I've been looking at using RegExes but am not sure if it would search for items in order or any order.
I guess I'm trying to basically create a simple google search in my application.

Have you considered using a proper search engine such as Lucene? The StandardAnalyzer in Lucene uses the StandardTokenizer, which takes care of (some) special characters, when tokenizing. It would for example split "red-car" into the tokens "red car", thereby "removing" special characters.
In order to search in multiple fields in a Lucene index, you could use the MultiFieldQueryParser.

I think you are looking for something like this:
public static bool HasWordsContaining(this string searchCriteria, string toFilter)
{
var regex = new Regex(string.Format("^{0}| {0}", Regex.Escape(toFilter)), RegexOptions.IgnoreCase);
return regex.IsMatch(searchCriteria);
}
Usage:
someList.Where(x=>x.Name.HasWordsContaining(searchedText)).ToList();

You might use CONTAINSTABLE for this. You can use a SPROC and pass in the search string.
USE AdventureWorks2012
GO
SELECT
KEY_TBL.RANK,
FT_TBL.Description
FROM
Production.ProductDescription AS FT_TBL
INNER JOIN
FREETEXTTABLE
(
Production.ProductDescription,
Description,
'perfect all-around bike'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC
GO
https://msdn.microsoft.com/en-us/library/ms142583.aspx

How to customize Lucene.NET to search for words with symbols without case-sensitivity (e.g. "C#" or ".net")?

The standard analyzer does not work. From what I can understand, it changes this to a search for c and net
The WhitespaceAnalyzer would work but it's case sensitive.
The general rule is search should work like Google so hoping it's a configuration thing considering .net, c# have been out there for a while or there's a workaround for this.
Per the suggestions below, I tried the custom WhitespaceAnalyzer but then if the keywords are separated by a comma and no-space are not handled correctly e.g.
java,.net,c#,oracle
will not be returned while searching which would be incorrect.
I came across PatternAnalyzer which is used to split the tokens but can't figure out how to use it in this scenario.
I'm using Lucene.Net 3.0.3 and .NET 4.0

Write your own custom analyzer class similar to SynonymAnalyzer in Lucene.Net – Custom Synonym Analyzer. Your override of TokenStream could solve this by pipelining the stream using WhitespaceTokenizer and LowerCaseFilter.
Remember that your indexer and searcher need to use the same analyzer.
Update: Handling multiple comma-delimited keywords
If you only need to handle unspaced comma-delimited keywords for searching, not indexing then you could convert the search expression expr as below.
expr = expr.Replace(',', ' ');
Then pass expr to the QueryParser. If you want to support other delimiters like ';' you could do it like this:
var terms = expr.Split(new char[] { ',', ';'} );
expr = String.Join(" ", terms);
But you also need to check for a phrase expression like "sybase,c#,.net,oracle" (expression includes the quote " chars) which should not be converted (the user is looking for an exact match):
expr = expr.Trim();
if (!(expr.StartsWith("\"") && expr.EndsWith("\"")))
{
expr = expr.Replace(',', ' ');
}
The expression might include both a phrase and some keywords, like this:
"sybase,c#,.net,oracle" server,c#,.net,sybase
Then you need to parse and translate the search expression to this:
"sybase,c#,.net,oracle" server c# .net sybase
If you also need to handle unspaced comma-delimited keywords for indexing then you need to parse the text for unspaced comma-delimited keywords and store them in a distinct field eg. Keywords (which must be associated with your custom analyzer). Then your search handler needs to convert a search expression like this:
server,c#,.net,sybase
to this:
Keywords:server Keywords:c# Keywords:.net, Keywords:sybase
or more simply:
Keywords:(server, c#, .net, sybase)

Use the WhitespacerAnalyzer and chain it with a LowerCaseFilter.
Use the same chain at search and index time. by converting everything to lower case, you actually make it case insensitive.
According to your problem description, that should work and be simple to implement.

for others who might be looking for an answer as well
the final answer turned out be to create a custom TokenFilter and a custom Analyzer using
that token filter along with Whitespacetokenizer, lowercasefilter etc., all in all about 30 lines of code, i will create a blog post and post the link here when i do, have to create a blog first !

method to prepare string for a sql command

I need to create a string method that takes in a string and
escapes it so that it can be used in a database SQL query, for example:
"This is john's dog" ==> "This is john''s dog"
"This is a 'quoted' string" ==> "This is a ''quoted'' string"
I want my method to look something like this:
string PrepareForSQLCommand(string text)
{
...
}
Anyway, I don't know all of the characters that need to be escaped in SQL query.
I am not sure what the best approach is to do this, or if there is some
existing robust built-in stuff to do this in C#.
Apologies for not mentioning this earlier: I DO NOT HAVE THE OPTION TO USE PARAMETRIZED QUERIES.
Ted

The usual way to do this would be to use a parametrised query as part of a SqlCommand.
SqlCommand command = new SqlCommand("SELECT * FROM MyTable WHERE MyCol = #param", connection);
command.Parameters.AddWithValue("#param", "This is john's dog");
The framework then ensures safety for you, which is less error-prone than trying to work out all of the possible injection attacks for yourself.

Trigger Warning. This answer is in response to the following statement:
I do not have the option to use parametrized queries.
Please do not up-vote this answer and please don't accept this as the correct way of doing things. I don't know why the OP cannot use parametrized queries, so I am answering that specific question and not recommending this is how you should do this. If you are not the OP, please read the other answer I have given. Also, please bear in mind the above constraint before down-voting. Thanks.
End of trigger warning!
For Microsoft SQL Server (the answer is different depending on the server) you will need to escape the single quote characters.
'
But before you escape these characters, you should reject any character not on your white-list. This is because there are lots of very clever tricks out there and white-list validation is more secure than simply escaping characters you know are bad.
Regex whiteList = new Regex("[^'a-zA-Z0-9 -]");
query = whiteList.Replace(query, "");
For example, this would remove [ and ] characters, and ';' characters. You may need to adjust the regex to match your expectations as this is a very restrictive white-list - but you know what kind of data you are expecting to see in your application.
I hope this helps. Feel free to check out the OWASP website for more details on security and if you can find a way of using parametrized queries you'll sleep all the better for it.

C#: most readable string concatenation. best practice [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How should I concatenate strings?
There are several ways to concat strings in everyday tasks when performance is not important.
result = a + ":" + b
result = string.Concat(a, ":", c)
result = string.Format("{0}:{1}", a, b);
StringBuilder approach
... ?
what do you prefer and why if efficiency doesn't matter but you want to keep the code most readable for your taste?

It depends on the use. When you just want to concat two strings, using a + b is just much more readable than string.Format("{0}{1}", a, b). However, it is getting more complex, I prefer using string.Format. Compare this:
string x = string.Format("-{0}- ({1})", a, b);
against:
string x = "-" + a + "- (" + b + ")";
I think that in most cases it is very easy to spot the most readable way to do things. In the cases where it is debatable which one is more readable, just pick one, because your boss isn't paying for these pointless discussions ;-)

string.Format for me, but in practice I use whichever is fit for purpose, taking into account performance and readability.
If it was two variables I'd use.
string.Concat(str1, str2);
If it contained a constant or something that requires formatting then.
string.Format("{0} + {1} = {2}", x, y, x + y);
Or for something like an SQL query
string SqlQuery = "SELECT col1, col2, col3, col4" +
"FROM table t " +
"WHERE col1 = 1";
And string builder when performance matters.

String.Format(...) is slowest.
For simple concatenations which don't take place in a loop, use String.Concat(...) or the + operator, which translate to the same under the hood, afaik. What is more readable is very subjective.
Using a StringBuilder for simple concatenations is over-the-top for simple concatenations as well and has most likely too much overhead. I'd only use it in a loop.

For something like this (which I'm guessing is being sent to the UI), I would definitely prefer String.Format. It allows the string to be internationalized easy; you can grep for calls to String.Format and replace them with your translating format.

My personal preference is:
I find the + approach the most readable and only use Format() or a StringBuilder if there is a good reason (i18n, performance etc) for it. I (almost) never use Concat.
I think I find the + approach easier to read than Format() simply because I don't have to skip ahead to the end to see what variables are put into in the {} place-holders. And if the place-holders aren't in numeric order, it gets even harder to read imo.
But I guess for larger projects it makes sense to simply enforce using Format by a style guide just in case the code might be (re-)used in a project requiring i18n later on.

string.Format
for few concats. for more I use
StringBuilder approach
even if performance is not important. there is a team agreement I have to follow

I prefer String.Format for small strings, and StringBuilder for larger ones. My main reason is readability. It's a lot more readable to me to use String.Format (or StringBuilder.AppendFormat()), but I have to admit that that is just personal preference.
For really big text generation, you might want to consider using the new (VS2010) T4 Preprocessed Templates - they are really nice.
Also, if you're ever in VB.NET, I like the XML literal technique Kathleen Dollard talked about in episode 152 of hanselminutes.

Prefer to use:
String.Concat for simple concatenations like String.Concat("foo", bar);
String.Format for complex formatting like String.Format("{1}", url, text);
StringBuilder for massive concatenations like:
var sb = new StringBuilder();
sb.AppendLine("function fooBar() {");
sp.AppendLine(String.Join(Environment.NewLine, blah));
sp.AppendLine("}");
page.RegisterClientScript(page.GetType(), sb.ToString());
Prefer to avoid "foo" + "bar" (as well as if (foo == "bar"). And especially String.Format("{0}{1}", foo, bar) and
throw new Exception("This code was" +
"written by developer with" +
"13\" monitor");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Concerns about SQL Server 2008 Full Text Search - c#

Related

SQL Like Operator (single quotation) [duplicate]

asp.net c# allowing users to search string using multiple terms

How to customize Lucene.NET to search for words with symbols without case-sensitivity (e.g. "C#" or ".net")?

method to prepare string for a sql command

C#: most readable string concatenation. best practice [duplicate]

Categories

Resources