Problem Description
When using a Dapper parameter in an SQL WHERE-clause, the parameter appears to be case sensitive. However, when I replace the Dapper parameter with a string literal, the WHERE-clause is no longer case sensitive. I have created a simple ASP.NET Core recipe example web API to help illustrate the problem.
In this recipe example, I am using a PostgreSQL database and want to query a recipe table to get a recipe by name. I have made the name column of type citext, which is a case-insensitive string type.
Database Table
A description of the recipe table:
+-------------+--------+-----------+----------+--------------------+
| Column | Type | Collation | Nullable | Default |
+-------------+--------+-----------+----------+--------------------+
| recipe_id | uuid | | not null | uuid_generate_v4() |
| name | citext | | not null | |
| description | text | | | |
+-------------+--------+-----------+----------+--------------------+
The contents of the recipe table are:
+--------------------------------------+--------------------+-----------------------------------------------------------+
| recipe_id | name | description |
+--------------------------------------+--------------------+-----------------------------------------------------------+
| 8f749e7a-e192-48df-91af-f319ab608212 | meatballs | balled up meat |
| f44c696f-a94a-4f17-a387-dd4d42f60ef8 | red beans and rice | yummy new orleans original |
| 82c5911b-feec-4854-9073-6a85ea793dc0 | pasta cereal | couscous and ground meat eaten with a spoon, like cereal! |
+--------------------------------------+--------------------+-----------------------------------------------------------+
Query Method
The RecipeController has a GetByName method that accepts the name parameter as part of the URI path. The GetByName method calls the GetByNameAsync method of the RecipeRepository class, which contains the SQL statement in question:
public async Task<Recipe> GetByNameAsync(string name)
{
string sql = $#"
SELECT *
FROM {nameof(Recipe)}
WHERE {nameof(Recipe)}.{nameof(Recipe.name)} = #{nameof(name)}";
using (IDbConnection connection = Open())
{
IEnumerable<Recipe> recipes = await connection.QueryAsync<Recipe>(sql, new {name});
return recipes.DefaultIfEmpty(new Recipe()).First();
}
}
Query Responses
If I wanted to query the meatballs recipe by name, and set the name parameter equal to "meatballs", I get the following response:
{
"recipe_id": "8f749e7a-e192-48df-91af-f319ab608212",
"name": "meatballs",
"description": "balled up meat"
}
Setting the name parameter equal to "Meatballs", I get the following response:
{
"type": "https://tools.ietf.org/html/rfc7231#section-6.5.4",
"title": "Not Found",
"status": 404,
"traceId": "00-5e4e35d5cfec644fc117eaa96e854854-c0490c8ef510f3b1-00"
}
And finally, if I replace the Dapper name parameter with the string literal "Meatballs":
public async Task<Recipe> GetByNameAsync(string name)
{
string sql = $#"
SELECT *
FROM {nameof(Recipe)}
WHERE {nameof(Recipe)}.{nameof(Recipe.name)} = 'Meatballs'";
using (IDbConnection connection = Open())
{
IEnumerable<Recipe> recipes = await connection.QueryAsync<Recipe>(sql, new {name});
return recipes.DefaultIfEmpty(new Recipe()).First();
}
}
I get the following response:
{
"recipe_id": "8f749e7a-e192-48df-91af-f319ab608212",
"name": "meatballs",
"description": "balled up meat"
}
Why is this Dapper parameter forcing case-sensitivity? And how can I get around this?
Background
As Jeroen pointed out:
Presumably Dapper isn't doing any such thing and the same thing happens from any language where a parameter is passed as a regular string...
This indeed was not an issue with Dapper, but an issue with SQL data types. The name column in the recipeExample database does not know the incoming data type is supposed to be of type citext. Therefore, casting the incoming argument to citext is necessary.
As Jeroen also pointed out:
From what I gather Postgres also supports collations, and using a case-insensitive collation on a regular string type is likely to work without conversion of any kind.
Somehow I missed this, but the pgDocs even recommend considering nondeterministic collations instead of using the citext module. After reading up on localization, collations, and watching this YouTube video, I updated the recipe example web api to compare using the citext module with nondeterministic collations.
Database Update
First, I added the case_insensitive collation provided in the PostgreSQL documentation:
CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
Then, I updated the recipe table to have two name columns: name_1 of type text using the case_insensitive collation, and name_2 of type citext:
+-------------+--------+------------------+----------+--------------------+
| Column | Type | Collation | Nullable | Default |
+-------------+--------+------------------+----------+--------------------+
| recipe_id | uuid | | not null | uuid_generate_v4() |
| name_1 | text | case_insensitive | not null | |
| name_2 | citext | | not null | |
| description | text | | | |
+-------------+--------+------------------+----------+--------------------+
Indexes:
"recipe_pkey" PRIMARY KEY, btree (recipe_id)
"recipe_name_citext_key" UNIQUE CONSTRAINT, btree (name_2)
"recipe_name_key" UNIQUE CONSTRAINT, btree (name_1)
Next, I created three Postgres functions to test out the 'Meatballs' query:
The first function queries the name_1 column and takes a text argument
The second function queries the name_2 column and takes a text argument
The third function queries the name_2 column and takes a citext argument
CREATE FUNCTION getrecipe_name1_text(text) RETURNS recipe as $$
SELECT *
FROM recipe
WHERE recipe.name_1 = $1;
$$ LANGUAGE SQL;
CREATE FUNCTION getrecipe_name2_text(text) RETURNS recipe as $$
SELECT *
FROM recipe
WHERE recipe.name_2 = $1;
$$ LANGUAGE SQL;
CREATE FUNCTION getrecipe_name2_citext(citext) RETURNS recipe as $$
SELECT *
FROM recipe
WHERE recipe.name_2 = $1;
$$ LANGUAGE SQL;
Query Tests
Querying the name_1 column with text argument:
recipeexample=# SELECT * FROM getrecipe_name1_text('Meatballs');
+--------------------------------------+-----------+-----------+----------------+
| recipe_id | name_1 | name_2 | description |
+--------------------------------------+-----------+-----------+----------------+
| 8f749e7a-e192-48df-91af-f319ab608212 | meatballs | meatballs | balled up meat |
+--------------------------------------+-----------+-----------+----------------+
(1 row)
Querying the name_2 column with text argument:
recipeexample=# SELECT * FROM getrecipe_name2_text('Meatballs');
+-----------+--------+--------+-------------+
| recipe_id | name_1 | name_2 | description |
+-----------+--------+--------+-------------+
| | | | |
+-----------+--------+--------+-------------+
(1 row)
Querying the name_2 column with citext argument:
recipeexample=# SELECT * FROM getrecipe_name2_citext('Meatballs');
+--------------------------------------+-----------+-----------+----------------+
| recipe_id | name_1 | name_2 | description |
+--------------------------------------+-----------+-----------+----------------+
| 8f749e7a-e192-48df-91af-f319ab608212 | meatballs | meatballs | balled up meat |
+--------------------------------------+-----------+-----------+----------------+
(1 row)
Conclusion
If the citext module is used, arguments must be cast to citext when querying
If the case_insensitive collation is used, there will be performance penalties and pattern matching operations are not possible
Related
I have a sql table below, value of column parameter and parameter value are dynamically created. The design below is cater for additional parameter being added in later stage. So I think using the parameter and parameter value as column is not ideal for such design.
|---------------------|------------------|------------------|
| Parameter | Parameter Value | Computers |
|---------------------|------------------|------------------|
| Phase | New | PC1 |
|---------------------|------------------|------------------|
| Phase | New | PC2 |
|---------------------|------------------|------------------|
| Phase | Redevelopment | PC3 |
|---------------------|------------------|------------------|
| Cost | High | PC1 |
|---------------------|------------------|------------------|
| Cost | High | PC2 |
|---------------------|------------------|------------------|
| Cost | Cost | PC3 |
|---------------------|------------------|------------------|
Given a scenario where a user search by Phase = "New" AND Cost = "High", it will result in PC1.
At this moment, I could think of is this:
SELECT *
FROM projectParameter
WHERE Parameter = 'Phase' AND Value = 'New' AND Parameter = 'Cost' AND Value = 'High'
Thanks in advance!
First, select all rows that match any part of your filtering.
Then aggregate all those rows to get one result per computer.
Then check each result to see if it contains all the required filtering contraints.
SELECT
Computers
FROM
yourTable
WHERE
(Parameter = 'Phase' AND ParameterValue = 'New')
OR
(Parameter = 'Cost' AND ParameterValue = 'High')
GROUP BY
Computers
HAVING
COUNT(*) = 2
From what I understand, it seems you want list of all computers where there is computer entry for both below conditions :
Parameter = 'Cost' AND Parameter Value = 'High'
Parameter = 'Phase' AND Parameter Value = 'New'
You can try below sql query to see it results your need :
SELECT t.computer
FROM table t
WHERE t.parameter = 'cost'
AND t.parameter_value = 'high'
AND EXISTS (
SELECT computer FROM table where computer=t.computer AND parameter = 'phase' AND parameter_value = 'new');
I have a SQL table similar to (call it UserTable)
+--------+-----------+----------+
| UserId | FirstName | LastName |
+--------+-----------+----------+
| 123 | Bob | Smith |
| 456 | John | Doe |
+--------+-----------+----------+
On a different server I have a table (call it UserBackupTable)
+----------+--------+-----------+----------+
| Location | UserId | FirstName | LastName |
+----------+--------+-----------+----------+
| A | 123 | Bob | Smith |
| B | 456 | John | Doe |
+----------+--------+-----------+----------+
The two tables are identical except for the addition of one column (Location). I would like to backup/copy UserTable to UserBackupTable in a better way than
var dataToCopy = userDb.UserTable.Select(user => new UserBackup
{
Location = _location,
FirstName = user.FirstName,
LastName = user.LastName
}).ToList();
backupDb.UserBackups.AddRange(dataToCopy);
This works but isn't very efficient for me when I have 40+ columns to have to manually type out. These are database first models in case that is needed.
Can you not just do this at the database layer, rather than via an ORM, e.g.
SELECT *
INTO [backupDb].[dbo].[UserBackups]
FROM [dbo].[UserTable]
You'll need to modify the above depending how you do it, e.g. incrementally, or recreating the whole backup table each time depending on the size of your data (just make sure you drop the existing table and select into its replacement as part of a transaction) but you could just automate it via SQL Server Agent or something.
Note that SELECT INTO creates a new target table, so that wouldn't be suitable for an incremental backup approach.
I have one central and two client database that have the same structure (identity id). The application allows users to merge data of selected tables from central db with one client db at a time.
For example:
[db_central].[table]:
+-------+-------+
| id | name |
+-------+-------+
| 1 | A |
| 2 | B |
| 3 | C |
+---------------+
[db_client_1].[table]:
+-------+-------+
| id | name |
+-------+-------+
| 3 | D |
+---------------+
[db_client_2].[table]:
+-------+-------+
| id | name |
+-------+-------+
| 3 | E |
+---------------+
Expected result after merging (twice):
[db_central].[table]:
+-------+-------+
| id | name |
+-------+-------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
+---------------+
Currently, I'm only able to load tables from database.
When user clicks "Manual Sync" button, the app will compare and merge data of selected tables from left to right database or vice versa.
If table doesn't exist, it will create the new one. If table does exist, it will compare and merge data but I don't know what is the best solution to accomplish this task.
Any suggestion would be appreciated.
This seems like a simple sql query (if the databases are on the same server, or if you have a linked server between them) using except...
insert into db_central // Target table
select name from db_client_1 // Source table
except
select name from db_central // Target table
If you have to do it in Linq, then it's very similar:
// Get the list of names to add
var newNames = dbContext.db_client_1.Select(e => e.name).Except(dbContext.db_central.Select(e => e.name));
// Convert the names to a list of entity objects and add them.
dbContext.db_central.Add(newNames.Select(e => new db_central { name = e.name });
dbContext.SaveChanges();
This is assuming you don't want duplicates in db_central
Ideally you should have two columns at table in Central Database
Primary key (with identity enabled)
ChildKey (Primary Key of Child databases)
Primary key column in central database will take care of ordering, and chiild column will give you primary key in respective database
i am designing a database and have theory problem, about which solution works better to run queries, to be faster on microsoft sql server or simply more relational.
GIVEN
Lets say, we have the following Tables:
Congress, Person, Session, Room, and much more.
Don't mind about the given names. These are just some basic standalone entities.
-----------------------------------------------------------------
| Congress | Person | Session | Room |
-----------------------------------------------------------------
| CongressID | PersonID | SessionID | RoomID |
| Name | Name | Name | Name |
| ... | ... | ... | ... |
-----------------------------------------------------------------
Additionally we have a table called "Right". Rights have a name and can define access to something like one or many of the basic entities. Each person can have those rights assigned.
So there are 2 more tables:
Right and PersonRight
---------------------------------
| Right | PersonRight |
---------------------------------
| RightID | PersonRightID |
| Name | PersonID |
| ... | RightID |
| ... | ... |
---------------------------------
SOUGHT-AFTER
Now there is only one thing missing. The way or table that represents the relations to the other entities. I know three different ways that all will work, but i don't have the deep experience to decide which one will be the best.
1. The relational way?
Upgrade: For every new entity, add a new table
Relation: Right 1 : N Entities
Pros: Adding new entities doesn't affect the others in any way, foreign keys to entities
Cons: Many tables with maybe redundant columns like CreatedDate or rowguid.
SQL Example::
select *
from Right r
left join RightCongress rc on r.RightID = rc.RightID
left join RightSession rs on r.RightID = rs.RightID
left join RightRoom ro on r.RightID = ro.RightID
left join Congress ec on rc.CongressID = ec.CongressID
left join Session es on rs.SessionID = es.SessionID
left join Room er on ro.RoomID = er.RoomID
-------------------------------------------------------
| RightCongress | RightSession | RightRoom |
-------------------------------------------------------
| RightCongressID | RightSessionID | RightRoomID |
| RightID | RightID | RightID |
| CongressID | SessionID | RoomID |
| ... | ... | ... |
-------------------------------------------------------
2. The column way?
2.1 The column way 1
Upgrade: For every new entity, add a new column to table "Right"
Relation: Right 1 : 1 Entities
Pros: No new table required, small statement, foreign keys to entities
Cons: Every new entity affect all other rows, only 1:1 relation possible, column count maybe confusing
SQL Example::
select *
from Right r
left join Congress ec on r.CongressID = ec.CongressID
left join Session es on r.SessionID = es.SessionID
left join Room er on r.RoomID = er.RoomID
-----------------
| Right |
-----------------
| RightID |
| Name |
| CongressID |
| SessionID |
| RoomID |
-----------------
2.2 The column way 2
Upgrade: For every new entity, add a new column to table "RightReference"
Relation: Right 1 : N Entities
Pros: 1:N relation, only one new table, small statement, foreign keys to entities
Cons: Every new entity affect all other rows, column count maybe confusing
SQL Example::
select *
from Right r
inner join RightReference rr on r.RightID on rr.RightID
left join Congress ec on rr.CongressID = ec.CongressID
left join Session es on rr.SessionID = es.SessionID
left join Room er on rr.RoomID = er.RoomID
---------------------------------------
| Right | RightReference |
---------------------------------------
| RightID | RightReferenceID |
| Name | RightID |
| ... | CongressID |
| ... | SessionID |
| ... | RoomID |
| ... | ... |
---------------------------------------
3. The reference way
Upgrade: For every new entity, add a new row to RightReference with the new ReferenceTypeID
Relation: Right 1 : N Entities
Pros: Only one new table and dynamic references
Cons: Anonymous references and have always to remember the indexes to build queries, no foreign keys to entities
Explanation: ReferenceID is the primary ID of the referenced entity/row, like of table Congress, Session and so on. So you can't suggest to which table it references. For that reason there is ReferenceTypeID. It points to a translation table called ReferenceType, where every table is stored with an unique id. Maybe it is possible to use the system method OBJECT_ID instead.
SQL Example::
select *
from Right r
inner join RightReference rr on r.RightID = rr.RightID
left join Congress ec on rr.ReferenceID = CongressID and rr.ReferenceType = 1
left join Session es on rr.ReferenceID = SessionID and rr.ReferenceType = 2
left join Room er on rr.ReferenceID = RoomID and rr.ReferenceType = 3
----------------------------------------------------------
| Right | RightReference | ReferenceType |
----------------------------------------------------------
| RightID | RightReferenceID | ReferenceTypeID |
| Name | RightID | Name |
| ... | ReferenceID | ... |
| ... | ReferenceTypeID | ... |
| ... | ... | ... |
----------------------------------------------------------
And now to all the sql experts.
What is the best or better lets say state of the art solution/way/approach to handle this task?
If you have other ways, please let me know.
What i am Looking for is: General Advantages and Disadavantages, SQL-Performance, implementation difficulties with EntityFramework and everything else you know or think about it.
Thanks!
Usually when dealing with relational databases, anything that requires a schema change is a no-no because you have to do potentially dangerous operations on your SQL server, update EF as well as whatever models you may be using and probably redeploy whatever application serves as the frontend for your database.
The SQL Solution
If you're OK with committing a no-no every time a new entity is added or are for some other reason tied to an RDBMS, you have two options:
If you care about your entity(Congress, Session, Room) table schema
Column way #2 is probably the best idea because it separates relational data from actual table data. Make a separate table for the relationships between entities and rights and put an index on every possible entityId. In your example you'd need indices on CongressId, SessionId and RoomId columns.
If you don't care about your entity table schema
Combine all entity tables into one large Entities table with an Id column and an XML column that contains all your actual entity info such as the type. Single relationship between Entities and rights and you're good.
The NoSQL Solution
If you can go this route, it would probably suit the flexible structure you're looking for much better. You will still need to update the code that accesses the document store but judging from your proposal that seems unavoidable unless you have some extraordinarily-flexible-but-error-prone code in place.
You don't need to do schema/EF updates every time a new entity type is added, and you don't need to worry about relationships. Your Person objects will have all their rights nested right inside and will be stored in the document store exactly that way.
I'm making a program and I need to make a query to the database asking for the string that appears most often in a given column. In this example, its "stringONE".
----------------------------
| ID | Column (string) |
----------------------------
| 1 | stringONE |
----------------------------
| 2 | stringTWO |
----------------------------
| 3 | stringONE |
----------------------------
| 4 | stringONE |
----------------------------
Now I need to take the name of the string that appears the most and put it into a variable string, for example:
string most_appeared_string = sql.ExecuteScalar();
Also, what happens if there is no string that appears the most, rather 2 or more strings that appear the same amount of times, like this:
----------------------------
| ID | Column (string) |
----------------------------
| 1 | stringONE |
----------------------------
| 2 | stringTWO |
----------------------------
| 3 | stringTWO |
----------------------------
| 4 | stringONE |
----------------------------
Thanks ahead.
#KeithS
Do you have an sql-server version of the query because I'm getting some errors when trying it there. Here's a table example of what I'd like to do precisely.
------------------------------------------------
| ID | column1 (string) | author (string) |
------------------------------------------------
| 1 | string-ONE | John |
------------------------------------------------
| 2 | string-TWO | John |
------------------------------------------------
| 3 | string-ONE | Martin |
------------------------------------------------
| 4 | string-ONE | John |
------------------------------------------------
SELECT TOP (1) column1, COUNT(*) FROM table WHERE author='John' ORDER BY ID
It should return "string-ONE" since it appears the most (2) times for the author John. When trying the query in MS-SQL Management Studio though, this is the error I'm getting:
Column 'table.column1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Nevermind the edit. Thank you.
This is a pretty easy query (in T-SQL at least):
select top 1 Column, Count(*) from Table group by Column order by Count(*) desc
ExecuteScalar, by an implementation detail, will return the string value because it's the first column of the only row in the result set, even though there are two columns. You could also use ExecuteReader to access the number of times that string occurs.
select top (1) SomeCol, count(*) as Row_Count
from YourTable
group by SomeCol
order by Row_Count desc
Also, what happens if there is no string that appears the most, rather
2 or more strings that appear the same amount of times, like this:
In that case, using the above query, you will get one arbitrary row. You can add with ties to get all rows that has the same highest value.
select top (1) with ties SomeCol, count(*) as Row_Count
from YourTable
group by SomeCol
order by Row_Count desc
SELECT max(counted) AS max_counted FROM (
SELECT count(*) AS counted FROM counter GROUP BY date
)
This could do the trick