RegEx to extract parameters from a SQL stored procedure definition - c#

I'm attempting to come up with a RegEx pattern to identify the string of parameters in a stored procedure definition. It's rather obtuse, considering all the various possibilities, but here's what I have so far (with global flags set for case insensitivity and single-line mode):
(?:create proc.*?)((?:[#]\w+\s+[A-Za-z0-9_()]+(?:\s*=\s*\S+)?(?:\s*,\s*)?)+)(?:.*?with\s+(?:native_compilation|schemabinding|execute\s+as\s+\S+))?(?:[\s\)]*?as)
I've set up several unit tests with a variety of stored proc definitions - with or without parameters, with or without defaults, etc. - and it seems to work in all cases except for one (that I've found so far):
CREATE PROCEDURE [dbo].[sp_creatediagram]
(
#diagramname sysname,
#owner_id int = null,
#version int,
#definition varbinary(max)
)
WITH EXECUTE AS 'dbo'
AS
BEGIN
set nocount on
...
Obviously I'd expect the first capture group to capture all four parameters...
#diagramname sysname,
#owner_id int = null,
#version int,
#definition varbinary(max)
...but for some reason, the RegEx search halts after the second parameter - notably the one that includes a default assignment - and doesn't proceed to capture the remaining two parameters. The first capture group ends up looking like this:
#diagramname sysname,
#owner_id int = null,
I won't be the least bit surprised to learn that I'm grossly overcomplicating this, but I do feel like I'm really close. I imagine there must be something about the way the RegEx engine works that I'm not quite understanding. Any help is hugely appreciated. Thanks very much in advance.

Related

How can I load a .Net DataTAble schema from a UDT Table declared in our DB?

I've searched every way I can come up with, but can't find an technique for initializing a DataTable to match a UDT Table declared in our DB. I could manually go through and add columns, but I don't want to duplicate the structure in both places. For a normal table, one option would be to simply issue a "select * where ..." that returns no results. But can something like this be done for a UDT Table?
And here is the background problem.
This DB has a sproc that accepts a Table Valued Parameter that is an instance of the indicated UDT Table declared in the same DB. Most of the UD fields are nullable, and the logic to load the TVP is quite involved. What I hoped to do is initialize the DT, then insert rows as needed and set required column/field values as I go until I'm ready to toss the result to SS for final processing.
I can certainly add the dozen or more fields in code, but the details are still in flux (and may continue to be so for some time), which is one reason I don't really want to have to load all the columns in code.
So, is there a reasonable solution, or am I barking up the wrong tree? I've already spent more time looking for the solution I expected to exist than it would have taken to write the column loading code 100 times over, but now I just want to know if it's possible.
Ok, I was discussing with a friend who is MUCH more SQL savvy than I am (doesn't take much), and he suggested the following SQL query:
"DECLARE #TVP as MyUDTTable; SELECT * FROM #TVP"
This appears to give me exactly what I want, so I'm updating here should some other poor sap want something similar in the future. Perhaps others may offer different or better answers.
Here is an example of how I did this. This style of input/output is something me and a co-worker put together to allow quick and effective use of entity framework on his side and keeps my options open to use all sql toys. If that is the same use as you have you might also like the OUTPUT use I did here. It spits the newly created ids right back at whatever method calls the proc allowing the program to go right on to the next activity withouth pestering my database for the numbers.
My Udt
CREATE TYPE [dbo].[udtOrderLineBatch] AS TABLE
(
[BrandId] [bigint] NULL,
[ProductClassId] [bigint] NULL,
[ProductStatus] [bigint] NULL,
[Quantity] [bigint] NULL
)
and the procedure that takes is as an input
create procedure [ops].[uspBackOrderlineMultipleCreate]
#parmBackOrderId int
,#UserGuid uniqueidentifier
null
,#parmOrderLineBatch as udtOrderLineBatch readonly
as
begin
insert ops.OrderLine
(
BrandId
,ProductClassId
,ProductStatusId
,BackOrderId
,OrderId
,DeliveryId
,CreatedDate
,CreatedBy)
output cast(inserted.OrderLineId as bigint) OrderLineId
select line.BrandId
,line.ProductClassId
,line.ProductStatus
,#parmBackOrderId
,null
,null
,getdate()
,#UserGuid
from #parmOrderLineBatch line
join NumberSequence seq on line.Quantity >= seq.Number
end

Fix html encoded text stored in the database

I have a sql server db that has a table which stores a plain text value in a nvarchar column. Unfortunately there was a bug in the C# code that was running Encoder.HtmlEncode() on chinese characters before inserting it into the table . e.g text value of 您好 is being stored in the table as 您好
Is there any way I clean up this data using just T-sql? This database is heavily locked down, so I can't easily run any code against it other than T-sql.
From what the problem seems to be, you have an option.
You could create a temp table that will store the HTML entity of the characters. As an example;
CREATE TABLE dbo.TempHost
{
Entity varchar(255),
Character nvarchar(255)
}
Then you can actually find the data as csv online (http://www.khngai.com/chinese/charmap/tbluni.php?page=0 or copy and paste to excel), and import it into the table. From there on, all you will need to do is to scan the data and call REPLACE() function and update.
This is a fun challenge, and by fun I mean not really fun. T-SQL is quite bad at string manipulation. To make it even better, HTML entities actually encode a Unicode code point, and there is no simple way of converting that to a Unicode character in T-SQL.
Using a lookup table is probably the most viable method, in that it's likely to be more efficient than what I'm going to propose here: use a function to do the entity replacement. Warning: scalar-valued functions perform horribly in T-SQL and string manipulation is none too fast either. Nevertheless, I present this for, um, inspirational purposes:
CREATE FUNCTION dbo._ConvertEntities(#in NVARCHAR(MAX)) RETURNS NVARCHAR(MAX) AS BEGIN
WHILE 1 = 1 BEGIN;
DECLARE #entityStart INT = CHARINDEX('&#x', #in);
IF #entityStart = 0 BREAK;
DECLARE #entityEnd INT = CHARINDEX(';', #in, #entityStart)
DECLARE #entity VARCHAR(MAX) = SUBSTRING(#in, #entityStart + LEN('&#x'), #entityEnd - #entityStart - LEN('&#x'));
IF #entity NOT LIKE '[0-9A-F][0-9A-F][0-9A-F][0-9A-F]' RETURN #in;
DECLARE #entityChar NCHAR(1) = CONVERT(NCHAR(1), CONVERT(BINARY(2), REVERSE(CONVERT(BINARY(2), #entity, 2))));
SET #in = STUFF(#in, #entityStart, #entityEnd - #entityStart + 1, #entityChar);
END;
RETURN #in;
END;
Aside from performance issues, this function has the major shortcoming that it only works for entities of the form &#x????;, with ???? four hexadecimal digits. It fails quite badly for other entities (like those needing surrogates, those coded as decimal, or special entities like "). I've made it bail out in this case. Although it's fairly easy to extend it to handle single-byte entities, extending it to >4 would be agony.
Realistically, you want to do this in client software using a real programming language. Even if the database is sufficiently locked down that you cannot directly execute queries, you are presumably able to query data if it's not too much, and you can insert data back using generated statements (a lot of them if need be). Terribly slow, but more or less viable.
For completeness, I also mention the option of running CLR code in SQL Server using CLR integration. This requires that the server already allows this or that you can reconfigure it to allow it (improbable if it's "heavily locked down"). The main reason this would be attractive is because it's definitely easier and faster to decode the entities in CLR code, and using CLR integration means you're not using client code (so the data doesn't leave the server). On the other hand, since you need administrative access to the machine to deploy the assembly, this would seem to be a theoretical advantage at best. As far as performance goes, though, it probably can't be beat.
You could take advantage of the fact the characters are being stored all start with "&#x" and are eight characters long. You could loop through the table updating cutting out the bad characters using something like the example below.
DECLARE #str VARCHAR(100)
SET #str = 'Hello 頶頴World'
DECLARE #pos int SELECT #pos = CHARINDEX('&#x', #str)
WHILE #pos > 0
BEGIN
SET #str = LEFT(#str, #pos -1) + RIGHT(#str, LEN(#str) -#pos - 8)
SELECT #pos = CHARINDEX('&#x', #str)
END
SELECT #str
HTML encoding is not the same as XML encoding, but thanks to this question, I've realized there is an embarrassingly simple way of achieving this:
SELECT
REPLACE(
CONVERT(NVARCHAR(MAX),
CONVERT(XML,
REPLACE(REPLACE(_column_, '<', '<'), '"', '"')
)
),
'<', '<'
)
Stick this in an UPDATE and you're done. Well, almost -- if the code contains non-XML escaped entities like é, you'd need to replace these separately. Also, we do need to dance around the issue of XML escaping (hence the < replacing in case there's a < somewhere).
It may still need some refinement, but this sure looks a lot more promising than a scalar-valued function. :-)

Using IN operator with Stored Procedure Parameter

I am building a website in ASP.NET 2.0, some description of the page I am working about:
ListView displaying a table (of posts) from my access db, and a ListBox with Multiple select mode used to filter rows (by forum name, value=forumId).
I am converting the ListBox selected values into a List, then running the following query.
Parameter:
OleDbParameter("#Q",list.ToString());
Procedure:
SELECT * FROM sp_feedbacks WHERE forumId IN ([#Q])
The problem is, well, it doesn't work. Even when I run it from MSACCESS 2007 with the string 1,4, "1","4" or "1,4" I get zero results. The query works when only one forum is selected. (In (1) for instance).
SOLUTION?
So I guess I could use WHERE with many OR's but I would really like to avoid this option.
Another solution is to convert the DataTable into list then filter it using LINQ, which seems very messy option.
Thanks in advance,
BBLN.
I see 2 problems here:
1) list.ToString() doesn't do what you expect. Try this:
List<int> foo = new List<int>();
foo.Add(1);
foo.Add(4);
string x = foo.ToString();
The value of "x" will be "System.Collections.Generic.List`1[System.Int32]" not "1,4"
To create a comma separated list, use string.Join().
2) OleDbParameter does not understand arrays or lists. You have to do something else. Let me explain:
Suppose that you successfully use string.Join() to create the parameter. The resulting SQL will be:
SELECT * FROM sp_feedbacks WHERE forumId IN ('1,4')
The OLEDB provider knows that strings must have quotation marks around them. This is to protect you from SQL injection attacks. But you didn't want to pass a string: you wanted to pass either an array, or a literal unchanged value to go into the SQL.
You aren't the first to ask this question, but I'm afraid OLEDB doesn't have a great solution. If it were me, I would discard OLEDB entirely and use dynamic SQL. However, a Google search for "parameterized SQL array" resulted in some very good solutions here on Stack Overflow:
WHERE IN (array of IDs)
Passing an array of parameters to a stored procedure
Good Luck! Post which approach you go with!
When you have:
col in ('1,4')
This tests that col is equal to the string '1,4'. It is not testing for the values individually.
One way to solve this is using like:
where ','&#Q&',' like '*,'&col&',*'
The idea is to add delimiters to each string. So, a value of "1" becomes ",1,"in the column. A value of "1,4" for #Q becomes ",1,4,". Now when you do the comparison, there is no danger that "1" will match "10".
Note (for those who do not know). The wildcard for like is * rather than the SQL standard %. However, this might differ depending on how you are connecting, so use the appropriate wildcard.
Passing such a condition to a query has always been a problem. To a stored procedure it is worse because you can't even adjust the query to suit. 2 options currently:
use a table valued parameter and pass in multiple values that way (a bit of a nuisance to be honest)
write a "split" multi-value function as either a UDF or via SQL/CLR and call that from the query
For the record, "dapper" makes this easy for raw commands (not sprocs) via:
int[] ids = ...
var list = conn.Query<Foo>(
"select * from Foo where Id in #ids",
new { ids } ).ToList();
It figures out how to turn that into parameters etc for you.
Just in case anyone is looking for an SQL Server Solution:
CREATE FUNCTION [dbo].[SplitString]
(
#Input NVARCHAR(MAX),
#Character CHAR(1)
)
RETURNS #Output TABLE (
Item NVARCHAR(1000)
)
AS BEGIN
DECLARE #StartIndex INT, #EndIndex INT
SET #StartIndex = 1
IF SUBSTRING(#Input, LEN(#Input) - 1, LEN(#Input)) <> #Character
BEGIN
SET #Input = #Input + #Character
END
WHILE CHARINDEX(#Character, #Input) > 0
BEGIN
SET #EndIndex = CHARINDEX(#Character, #Input)
INSERT INTO #Output(Item)
SELECT SUBSTRING(#Input, #StartIndex, #EndIndex - 1)
SET #Input = SUBSTRING(#Input, #EndIndex + 1, LEN(#Input))
END
RETURN
END
Giving an array of strings, I will convert it to a comma separated List of strings using the following code
var result = string.Join(",", arr);
Then I could pass the parameter as follows
Command.Parameters.AddWithValue("#Parameter", result);
The In Stored Procedure Definition, I would use the parameter from above as follows
select * from [dbo].[WhateverTable] where [WhateverColumn] in (dbo.splitString(#Parameter, ','))

Use NVARCHAR variables in ASP.NET without declaring their length

i have this query
SELECT COUNT(Email) FROM Blacklist WHERE (Email = #email OR (Email like '#%' AND #email like '%' + Email)) AND (CustomerId = #cid OR CustomerId = -1)
I want to see if the value in blacklist starts with a # and if it does i also want to check the parameter and se if it ends with the value in blacklist.
(Email like '#%' AND #email like '%' + Email)
This works in SqlManager if i declare the variables exaktly as they are in the table like this.
declare #email as nvarchar(200) = 'firstname.lastname#xyz.com'
declare #cid as integer = 2
SELECT COUNT(Email) FROM Blacklist WHERE (Email = #email OR (Email like '#%' AND #email like '%' + Email)) AND CustomerId = #cid
The value in blacklist is "#xyz.com"
If i remove the (200) part from nvarchar(200) it stop working.
So my question is how to solve this from .NET C# ?
db.AddParameter("#email", SqlDbType.NVarChar, email);
You don't say what "stop working" means, what happens? Do you get the wrong result, or an error? And why do you remove the length from the declaration?
If you do not declare a variable length for nvarchar it is 1 by default and so it would have the value N'f' in this case; I guess you get a count of zero instead of whatever number you expect? Note that nvarchar(200)is not "fixed length", it means "maximum 200 characters"; nchar(200) would indeed be fixed length.
In any case, it's not really clear what your problem is here: the string length is simply part of the variable declaration in TSQL and it isn't clear why you can't do exactly what you showed above. You could declare the variable as nvarchar(max) to avoid dealing with specific lengths, if that's your issue.
Whether or not this is useful for you probably depends on how you connect to the database (you mentioned C# but are you using ADO, LINQ or something else?).
EDIT: In C#, you may be looking for the size parameter for the SqlDatabase.AddParameter method, if db in your question is indeed an SqlDatabase.
The problem is (I work with Fredrik) that when the query is executed from C# code, it returns 0, ie not blacklisted. When executed from SQL Managment Studio, where the length of the parameter can (must) be specified, it returns 1, ie IS blacklisted.
Our guess was that the parameter need a specified size (the same as the database field. However, it seems to be working from Management Studio even if the parameter is declared as nvarchar(100).
The question remains, why does the query return 0 when executed from code?
I had the same issue as you, and after much trial-and-error/digging, I think I've figured it out.
I've made a SQL Fiddle to demonstrate this issue: http://sqlfiddle.com/#!3/c0a51/3
Basically, even though you can set an NVARCHAR to strings of any length, it only saves off as many characters as it is defined to have. Since the default is a length of 1, the only character included for your LIKE statement is the first one, in your case, the first character in the email address.
Which is basically what Marc_S said, but with an explanation of why x = 'ab' was only giving us 'a'.
UPDATE
With more digging, I found that this is true of what happens if you write your SQL out like I did in the Fiddle. However, .NET will correctly infer the length of an NVARCHAR from the length of your string and declare the variable appropriately, so it won't have this problem: http://msdn.microsoft.com/en-us/library/hex23w80. So your query in studio is not behaving how your application is behaving.

What are the different ways of handling 'Enumerations' in SQL server?

We currently define a list of constants (mostly these correspond to enumerations we have defined in the business layer) at the top of a stored procedure like so:
DECLARE #COLOR_RED INT = 1
DECLARE #COLOR_GREEN INT = 2
DECLARE #COLOR_BLUE INT = 3
But these often get repeated for many stored procedures so there is a lot of duplication.
Another technique I use if the procedure needs just one or two constants is to pass them in as parameters to the stored procedure. (using the same convention of upper case for constant values). This way I'm sure the values in the business layer and data layer are consistent. This method is not nice for lots of values.
What are my other options?
I'm using SQL Server 2008, and C# if it makes any difference.
Update Because I'm using .Net is there any way that user defined (CLR) types can help?
This might be controversial: my take is don't use enumerations in T-SQL. T-SQL isn't really designed in a way that makes enums useful, the way they are in other languages. To me, in T_SQL, they just add effort and complexity without the benefit seen elsewhere.
I can suggest two different approaches:
1) Define an Enumeration table with a tinyint identity column as the primary key and the enum value as a unique index; e.g.
CREATE TABLE [dbo].[Market](
[MarketId] [smallint] IDENTITY(1,1) NOT NULL,
[MarketName] [varchar](32) COLLATE Latin1_General_CS_AS NOT NULL,
CONSTRAINT [PK_Market] PRIMARY KEY CLUSTERED
(
[MarketId] ASC
) ON [PRIMARY]
) ON [PRIMARY]
Then either:
Have your application load the enumeration to primary key value mapping on start-up (assuming this will remain constant).
Define a function to translate enumeration values to primary key values. This function can then be used by stored procs inserting data into other tables in order to determine the foreign key to the enumeration table.
2) As per (1) but define each primary key value to be a power of 2. This allows another table to reference multiple enumeration values directly without the need for an additional association table. For example, suppose you define a Colour enumeration table with values: {1, 'Red'}, {2, 'Blue'}, {4, 'Green'}. Another table could reference Red and Green values by including the foreign key 5 (i.e. the bit-wise OR of 1 and 4).
Scalar user define function? Not perfect, but functional...
CREATE FUNCTION dbo.ufnRGB (
#Colour varchar(20)
)
RETURNS int
AS
BEGIN
DECLARE #key int
IF #Colour = 'BLue'
SET #key = 1
ELSE IF #Colour = 'Red'
SET #key = 2
ELSE IF #Colour = 'Green'
SET #key = 3
RETURN #KEy
END
I don't like the idea of defining what are effectively constants for stored procedures in multiple places - this seems like a maintenance nightmare and is easily susceptible to errors (typos etc). In fact, I can't really see many circumstances when you would need to do such a thing?
I would definitely keep all enumeration definitions in one place - in your C# classes. If that means having to pass them in to your procedures every time, so be it. At least that way they are only ever defined in one place.
To make this easier you could write some helper methods for calling your procedures that automatically pass the enum parameters in for you. So you call a helper method with just the procedure name and the "variable" parameters and then the helper method adds the rest of the enumeration parameters for you.
How about using a scalar function as a constant. A naming convention would make their usage close to enumerations:
CREATE FUNCTION COLOR_RED()
RETURNS INT
AS
BEGIN
RETURN 1
END
CREATE FUNCTION COLOR_GREEN()
RETURNS INT
AS
BEGIN
RETURN 2
END
...

Categories

Resources