Using query expressions for Unicode strings with LINQ

Using query expressions for Unicode strings with LINQ - c#

I am working with LINQ and and I have a database with columns for storing local content(non-english characters). Now I want to make a query using linq as follows
var desc = from p in db.GetDesc
where p.Category.Contains("xxxx".ToString())
orderby p.Date descending
select p;
Here the Category column contains unicode strings and the above query string doesn't work. How can I use natural language queries with LINQ?

Unicode in general should work fine with Linq to SQL and Linq to Entities against SQL Server (which I assume you're using). In fact your query should be
var desc = from p in db.GetDesc
where p.Category.Contains("xxxx")
orderby p.Date descending
select p;
There's no need to use .ToString(), since "xxxx" is already a Unicode string.
The problem seems to be with SQL Server. I tried your query against a table containing your Ethiopian characters, and as you say it doesn't work. If I query for .Contains("ስፖርት") then all rows are returned.
Running the SQL directly has the same result.
Trying a simple query like this fails (returns all rows)
select * from TestTable where Title like N'%' + NCHAR(0x1275) + N'%'
Here 0x1275 is the Unicode code point of the ት character.
If we look at the SQL documentation for NCHAR we see that only Unicode code points upto 4000 are supported. Unfortunately 0x1275 = 4725 so it looks like SQL Server (even 2012) won't support Ethiopian characters.
Having read that 4000 is the limit, testing reveals that running the above simple query with NCHAR(3129) succeeds (in my case returns no rows), but >= 3130 fails (returns all rows).

Related

Basic LINQ query to SQL C#

I have this LINQ query in C# for querying a db4o database.
IEnumerable<internetRecord> searchResult = from internetRecord ie in database
where ie.GSrecordID.Contains(txtSearchString.Text)
select ie;
What would be the equivalent query in SQL? (needed for comparison purposes) I have not worked with SQL much in the past and looking at it after using LINQ for a while it seems confusing.

SELECT *
FROM MyTable
WHERE GSRecordID LIKE '%txtSearchString%'

Select * from internetrecord where GSrecordID like '%your comparison string%'
Provided internetrecord is your SQL table GSrecordID is the column of your table.

I don't know much about db40 but in standard SQL it would be:
SELECT * FROM internetRecord
WHERE GSrecordID LIKE '%txtSearchString%'

YOu can do something like this
var result = database.Where(x => x.GSrecordID.Contains(txtSearchString.Text));

LINQ to SQL will not generate sargable query

I'm using LINQ To Sql (not Entity Framework), the System.Data.Linq.DataContext library, hitting a SQL Server 2005 database and using .Net Framework 4.
The table dbo.Dogs has a column "Active" of type CHAR(1) NULL. If I was writing straight SQL the query would be:
SELECT * FROM dbo.Dogs where Active = 'A';
The LINQ query is this:
from d in myDataContext.Dogs where d.Active == 'A' select d;
The SQL that gets generated from the above LINQ query converts the Active field to UNICODE. This means I cannot use the index on the dbo.Dogs.Active column, slowing the query significantly:
SELECT [t0].Name, [t0].Active
FROM [dbo].[Dog] AS [t0]
WHERE UNICODE([t0].[Active]) = #p1
Is there anything I can do to stop Linq to Sql from inserting that UNICODE() call (and thus losing the benefit of my index on dogs.Active)? I tried wrapping the parameters using the EntityFunctions.AsNonUnicode() method, but that did no good (it inserted a CONVERT() to NVARCHAR instead of UNICODE() in the generated sql), eg:
...where d.Active.ToString() == EntityFunctions.AsNonUnicode('A'.ToString());

Linq is meant to make it easier to write queries and does not always generate optimal SQL. Sometimes when high performance is required it is more efficient to write raw SQL directly against the database, the Linq datacontext supports mapping of SQL result to entities just like linq.
In your case I would suggest writing:
IEnumerable<Dog> results = db.ExecuteQuery<Dog>(
"SELECT * FROM dbo.Dogs where Active = {0}",
'A');

This is an old question, but I bumped into this recently.
Instead of writing
from d in myDataContext.Dogs where d.Active == 'A' select d;
Write
from d in myDataContext.Dogs where d.Active.Equals('A') select d;
This will produce the desired SQL without having to resort to any of the "hacks" mentioned in other answers. I can't say why for certain.
I've posted that as a question, so we'll see if we get any good answers.

There's not much you can do to the way LINQ queries are translated into SQL statements, but you can write a stored procedure that contains your queries and call that SP as a LINQ2SQL function. This way you should get full benefit of SQL Server optimizaions

You can do a little hack (as it is often required with LINQ to SQL and EF). Declare the property as NCHAR in the dbml. I hope that will remove the need to do the UNICODE conversion. We are tricking L2S in a benign way with that.
Maybe you need to also insert the EntityFunctions.AsNonUnicode call to make the right hand side a non-unicode type.
You can also try mapping the column as varchar.

How to perform a count on a arbitrary query (possibly containing a order by)

I have been tasked with updating our internal framework we use in-house. One of the things the framework does is you pass it a query and it will return the number of rows the query has in it (The framework makes heavy use of DataReaders so we need the total before hand for UI things).
The query that the count needs to be done on can be different from project to project (SOL-injection is not a issue, the query is not from user input, just hard coded in from another programmer when they use the framework for their project.) and I was told that just having the programmers write a second query for the count is unacceptable.
Currently the solution is to do the following (I did not write this, I was just told to fix it).
//executes query and returns record count
public static int RecordCount(string SqlQuery, string ConnectionString, bool SuppressError = false)
{
//SplitLeft is just myString.Substring(0, myString.IndexOf(pattern)) with some error checking. and InStr is just a wrapper for IndexOf.
//remove order by clause (breaks count(*))
if (Str.InStr(0, SqlQuery.ToLower(), " order by ") > -1)
SqlQuery = Str.SplitLeft(SqlQuery.ToLower(), " order by ");
try
{
//execute query
using (SqlConnection cnSqlConnect = OpenConnection(ConnectionString, SuppressError))
using (SqlCommand SqlCmd = new SqlCommand("select count(*) from (" + SqlQuery + ") as a", cnSqlConnect))
{
SqlCmd.CommandTimeout = 120;
return (Int32)SqlCmd.ExecuteScalar();
}
}
catch (Exception ex)
{
if (SuppressError == false)
MessageBox.Show(ex.Message, "Sql.RecordCount()");
return -1;
}
}
However it breaks on queries like (again, not my query, I just need to make it work)
select [ClientID], [Date], [Balance]
from [Ledger]
where Seq = (select top 1 Seq
from [Ledger] as l
where l.[ClientID] = [Ledger].[ClientID]
order by [Date] desc, Seq desc)
and Balance <> 0)
as it will removes everything after the order by and breaks the query. I thought I may go from simple string matching to a more complicated parser, but before I do that I wanted to ask if there is a better way.
UPDATE: The order by clause is dropped because if you include it using my method or a CTE you will get the error The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
Some more details: This framework is used for writing conversion applications. We write apps to pull data from a clients old database and move it in to our database format when a customer buys our CRM software. Often we are working with source tables that are poorly written and can be several Gigs in size. We do not have the resources to hold the whole table in memory so we use a DataReader to pull the data out so everything is not in memory at once. However a requirement is a progress bar with the total number of records to be processed. This RecordCount function is used to figure the max of the progress bar. It works fairly well, the only snag is if the programmer writing the conversion needs to order the data output, having a order by clause in the outer most query breaks count(*)
Partial Solution: I came up with this while trying to figure it out, it will not work 100% of the time but I think it will be better than the current solution
If I find a order by clause, I then check to see if the first thing in the query is a select (and no Top following) I replace that beginning text with select top 100 percent. It works better but I am not posting this as a solution as I am hoping for a universal solution.

Assuming you aren't going to see anything but fairly ordinary select statements, I don't think you don't need a full-on SQL parser to do what you want. You can reasonably make the assumption that you've got syntactically valid SQL. You need to build a tokenizer (lexical analyzer), though.
The lexical analysis needed for Transact SQL is pretty simple. The token list consist of (off the top of my head, since it's been a while since I had to do this):
whitespace
two types of comments:
---style comments
/.../`-style comments
three types of quoted literals:
string literals (e.g., `'my string literal'), and
two flavors of quoting reserved words for use as column or object names:
ANSI/ISO style, using double quotes (e.g., "table")
Transact-SQL style, using square-brackets (e.g., [table])
hex literals (e.g., 0x01A2F)
numeric literals (e.g. 757, -3218, 5.4 or -7.6E-32, 5.0m , $5.3201 etc.)
words, reserved or not: a unicode letter, underscore (''), 'at'-sign ('#') or hash ('#'), followed by zero or more of unicode letters, decimal digits, underscore ('') or the at-, dollar- or hash- signs ('#', '$' or '#').
operators, including parentheses.
It can pretty much all be done with regular expressions. If you were using Perl, you'd be done in a day, easy. It'll probably take a bit longer in C#, though.
I would probably treat comments as whitespace and collapse multiple sequences of whitespace and comment into a single whitespace token as it facilitates the recognition of constructs such as order by.
The reason you don't need a parser is that you don't really care very much about the parse tree. What you do care about is nested parentheses. So...
Once you've gotten a lexical analyzer that emits a stream of tokens, all you need to do is eat and discard tokens counting open/closing parentheses until you see a 'from' keyword at parenthetical depth 0.
Write select count(*) into your StringBuilder.
Start appending tokens (including the from) into the StringBuilder until you see an 'order by' at parenthetical depth 0. You'll need to build a certain amount of look-ahead into your lexer to do this (which see my earlier note regarding the collapsing of sequences of whitespace and/or comments into a single whitespace token.)
At this point, you should be pretty much done. Execute the query.
NOTES
Parameterized queries likely won't work.
Recursive queries, with a CTE and a with clause will probably get broken.
This will discard anything past the ORDER BY clause: if the query uses query hint, a FOR clause, or COMPUTE/COMPUTE BY, your results will likely differ from the original query (especially with any compute clauses, since those break up the queries result sets).
Bare UNION queries will get broken, since something like
select c1,c2 from t1
UNION select c1,c2 from t2
will get turned into
select count(*) from t1
UNION select c1,c2 from t2
All this is completely untested, just my thoughts based on oddball stuff I've had to do over the years.

Instead of modifying the existing clauses of the query - how about inserting a new clause, the INTO clause.
SELECT *
INTO #MyCountTable -- new clause to create a temp table with these records.
FROM TheTable
SELECT ##RowCount
-- or maybe this:
--SELECT COUNT(*) FROM #MyCountTable
DROP TABLE #MyCountTable
TSql query modification seems to be an eternal struggle to be the lastest thing that happens.

would you post a answer of how to do this "the right way" using IQueryable
Suppose you had some arbitrary query:
IQueryable<Ledger> query = myDataContext.Ledgers
.Where(ledger => ledger.Seq ==
myDataContext.Ledgers
.Where(ledger2 => ledger2.ClientId == ledger.ClientId)
.OrderByDescending(ledger2 => ledger2.Date)
.ThenByDescending(ledger2 => ledger2.Seq)
.Take(1).SingleOrDefault().Seq
)
.Where(ledger => ledger.Balance != 0);
Then you just get the Count of the rows, no need for any custom method or query manipulation.
int theCount = query.Count();
//demystifying the extension method:
//int theCount = System.Linq.Queryable.Count(query);
LinqToSql will include your desire for a count into the query text.

I guess you want to drop the order by clause to improve the performance. The general case is quite complex and you will need full sql parser to drop the ordering clause.
Also, did you check the comparative performance of
select count(id) from ....
v/s
select count(*) from (select id, a+b from ....)
The problem is that the a+b will need to be evaluated in latter, essentially executing query twice.
If you want a progress bar because the retrieval itself is slow then this is completely counter-productive, because you will spend almost the same amount of time estimating the count.
And if the application is complex enough that the data can change between the two query execution then you don't even know how reliable the count is.
So: the real answer is that you cannot get a count on arbitrary query in efficient way. For a non-efficient way, if your resultset is rewindable, then go the end of resultset, figure out the row count and then go back to the first row.

What if rather than try to re-build your query, you do something like:
WITH MyQuery AS (
select [ClientID], [Date], [Balance]
from [Ledger]
where Seq = (select top 1 Seq
from [Ledger] as l
where l.[ClientID] = [Ledger].[ClientID]
order by [Date] desc, Seq desc)
and Balance <> 0)
)
SELECT COUNT(*) From MyQuery;
Note I haven't tested this on SQL Server 2005 but it should work.
Update:
We've confirmed SQL Server 2005 does not support an ORDER BY clause within a CTE. This does, however, work with Oracle and perhaps other databases.

I wouldn't edit or try to parse the SQL at all, but you may have to use an EVIL CURSOR (don't worry, we won't explicitly iterate through anything). Here, I would simply pass your ad-hoc SQL to a proc which runs it as a cursor, and returns the number of rows in the cursor. There may be some optimizations available, but I've kept it simple, and this should work for any valid select statement (even CTEs) that you pass to it. No need to code and debug your own T-SQL lexer or anything.
create proc GetCountFromSelect (
#SQL nvarchar(max)
)
as
begin
set nocount on
exec ('declare CountCursor insensitive cursor for ' + #SQL + ' for read only')
open CountCursor
select ##cursor_rows as RecordCount
close CountCursor
deallocate CountCursor
end
go
exec GetCountFromSelect '// Your SQL here'
go

C# problem querying dbase file; problem using WHERE clause

I'm querying a dbase .dbf file with odbc from within my c# code and am having a problem using a 'where' clause in my query. I can retrieve and read the records fine if I just 'select * from FILE.DBF', and every example I see on web pages as I search for an answer show just that much syntax. I've tried multiple ways of constructing the select statement with a 'where' and so far they all fail. So, I'm wondering whether I just can NOT use a 'where' clause in a query against a dbase file, or whether I simply haven't hit on the correct syntax yet.
I've tried:
select * from FILE.DBF where GROUP = 21;
select * from FILE.DBF where GROUP = '21';
select * from FILE.DBF where GROUP = "21";
The result of all of these is the error: ERROR [42000] [Microsoft][ODBC dBase Driver] Syntax error in WHERE clause.
Any help will be appreciated.

Try surrounding the word GROUP with brackets ... as in ..
select * from FILE.DBF where [GROUP] = 21;
GROUP is a SQL keyword and it's most likely causing some issues.

GROUP is a keyword used for SQL itself. Try running the same query but with a different 'where' clause, by substituting 'Group' with another field instead (and a different condition too, naturally). If the query works, then 'GROUP' is being mixed up with the SQL syntax for GROUP BY, and thus you might need to use brackets or some other character to enclose the field name.

Using C# to Select from SQL database Table

I have a List of UserID's and a open connection to SQL Server. How can I loop through this List and Select matching UserID with First_Name and Last_Name columns? I assume the output can be in a datatable?
many thanks

It varies slightly depending on which type of SQL you're running, but this and this should get you started.

The most expedient way of doing this would be to:
Turn the List into a string containing a comma separated list of the userid values
Supply that CSV string into an IN clause, like:
SELECT u.first_name,
u.last_name
FROM USER_TABLE u
WHERE u.userid IN ([comma separated list of userids])
Otherwise, you could insert the values into a temp table and join to the users table:
SELECT u.first_name,
u.last_name
FROM USER_TABLE u
JOIN #userlist ul ON ul.userid = u.userid

Write a function in your SQL database named ParseIntegerArray. This should convert a comma delimited string into a table of IDs, you can then join to this in your query. This also helps to avoid any SQL injection risk you could get from concatenating strings to build SQL. You can also use this function when working with LINQ to SQL or LINQ to Entities.
DECLARE #itemIds nvarchar(max)
SET itemIds = '1,2,3'
SELECT
i.*
FROM
dbo.Item AS i
INNER JOIN dbo.ParseIntegerArray(#itemIds) AS id ON i.ItemId = id.Id

This article should help you: http://msdn.microsoft.com/en-us/library/aa496058%28SQL.80%29.aspx
I've used this in the past to create a stored procedure accepting a single comma delimited varchar parameter.
My source from the C# program was a checked list box, and I built the comma delimited string using a foreach loop and a StringBuilder to do the concatenation. There might be better methods, depending on the number of items you have in your list though.
To come back to the SQL part, the fn_Split function discussed in the article, enables you to transform the comma delimited string back to a table variable that SQL Server can understand... and which you can query in your stored procedure.
Here is an example:
CREATE PROCEDURE GetSelectedItems
(
#SelectedItemsID Varchar(MAX) -- comma-separated string containing the items to select
)
AS
SELECT * FROM Items
WHERE ItemID IN (SELECT Value FROM dbo.fn_Split(#SelectedItemsIDs,','))
RETURN
GO
Note that you could also use an inner join, instead of the IN() if you prefer.
If you don't have the fn_Split UDF on your SQL Server, you can find it here: http://odetocode.com/Articles/365.aspx
I hope this helps.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.