SQL/C# - Primary Key error on UPSERT - c#

UPDATE(simplified problem, removed C# from the issue)
How can I write an UPSERT that can recognize when two rows are the same in the following case...
See how there's a \b [backspace] encoded there (the weird little character)? SQL sees these as the same. While my UPSERT sees this as new data and attempts an INSERT where there should be an UPDATE.
//UPSERT
INSERT INTO [table]
SELECT [col1] = #col1, [col2] = #col2, [col3] = #col3, [col4] = #col4
FROM [table]
WHERE NOT EXISTS
-- race condition risk here?
( SELECT 1 FROM [table]
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3)
UPDATE [table]
SET [col4] = #col4
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3

You need the # sign, otherwise a C# character escape sequence is hit.
C# defines the following character escape sequences:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 - Unicode character 0
\a - Alert (character 7)
\b - Backspace (character 8)
\f - Form feed (character 12)
\n - New line (character 10)
\r - Carriage return (character 13)
\t - Horizontal tab (character 9)
\v - Vertical quote (character 11)
\uxxxx - Unicode escape sequence for character with hex value xxxx
\xn[n][n][n] - Unicode escape sequence for character with hex value nnnn (variable length version of \uxxxx)
\Uxxxxxxxx - Unicode escape sequence for character with hex value xxxxxxxx (for generating surrogates)

After hours of tinkering it turns out I've been on a wild goose chase. The problem is very simple. I pulled my UPSERT from a popular SO post. The code is no good. The select will sometimes return > 1 rows on INSERT. Thereby attempting to insert a row, then insert the same row again.
The fix is to remove FROM
//UPSERT
INSERT INTO [table]
SELECT [col1] = #col1, [col2] = #col2, [col3] = #col3, [col4] = #col4
--FROM [table] (Dont use FROM..not a race condition, just a bad SELECT)
WHERE NOT EXISTS
( SELECT 1 FROM [table]
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3)
UPDATE [table]
SET [col4] = #col4
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3
Problem is gone.
Thanks to all of you.

You are using '\u' which generates a Unicode character.
Your column is a varchar, which does not support Unicode characters. nvarchar would support the character.

Related

Regex - How to find all occurrence of commands inserts in a text (Batch SQL)

I need to find all occurrence of inserts commands inside a string. (In the middle of the insert can have a break line)
For example:
...
SOME COMMAND TEXT
SOME COMMAND TEXT
INSERT INTO table (CAMPO 1, CAMPO 2)
VALUES (1, 'some text in (parentheses) ')
SOME COMMAND TEXT
INSERT INTO table (CAMPO 1, CAMPO 2) VALUES (1, 2)
SOME COMMAND TEXT BELOW INSERT IN THE SAME LINE
INSERT INTO table (CAMPO 1, CAMPO 2) VALUES (1, 2);INSERT INTO table (CAMPO 1, CAMPO 2) VALUES (1, 2);INSERT INTO table (CAMPO 1, CAMPO 2) VALUES (1, 2)
SOME COMMAND TEXT
INSERT INTO table VALUES (1, 1, 1, 1, 1);
INSERT INTO table VALUES (1, 1, 1, 1, 1) ;
INSERT INTO table VALUES (1, 1, 1, 1, 1);
INSERT INTO table VALUES (1, 1, 1, 1, 1) ;
SOME COMMAND TEXT
SOME COMMAND TEXT
...
In this case, I need to get with a regex all inserts
Can you guys help me, please?
Try using /^INSERT.+$/gm. Checks for beginning of line (^), INSERT (INSERT), any character (.) any (non-zero) number of times (+), end of line ($). The anchors ^ and $ ensure that the entire line matches. The m modifier treats multiple line strings as separate lines rather than a single large line. The g modifier will ensure that we do not stop at the first match.
If you would like to match specific components of the statement, you could use /^INSERT INTO (.+) VALUES (.+)$/gm and access the components through capture groups.
If you might have line breaks within the insert statements, you should try using /^INSERT INTO ((?:.|\n)+?) VALUES ((?:.|\n)+?)$/gm. With (?:.|\n)+? we're matching any character including newlines non-greedily.
to capture all the insert statement.
You can use
(INSERT INTO(?s:.)*?VALUES(?s:.)*?\))
https://regex101.com/r/tejrDM/4 shows the result of output

regular expression to capture content of a group and reuse them

How can I write a regular expression to replace
VALUES ('some text')
with
SELECT * FROM (SELECT 'some text') AS tmp...
Basically, I have an input file, with multiple Insert statements. I want to use Regex to convert each insert statement into a IF NOT EXISTS then INSERT Statement (and run in in MySQL).
So, this is my input:
INSERT INTO table_listnames (name, address, tele) VALUES ('Rupert', 'Somewhere', '022')
and this is the desired output:
INSERT INTO table_listnames (name, address, tele)
SELECT * FROM (SELECT 'Rupert', 'Somewhere', '022') AS tmp
WHERE NOT EXISTS (
SELECT VersionNumber FROM ReleaseInfo WHERE VersionNumber = '1.0.0.1'
) LIMIT 1;
You could use
VALUES\s*\(([^()]*)\)
And replace this with
SELECT * FROM (SELECT $1) AS tmp
See a demo on regex101.com.
Broken down, this says:
VALUES # match VALUES
\s*\( # whitespaces, optionally, (
([^()]*) # capture anything inside ()
\s* # another whitespaces, optionally
\) # )

Efficient SQL Bucket Sort based on Length of Substring Match

Given a SQL database table containing strings indexed alphabetically, how might I perform a search query that orders by substring match?
For example, given the data set:
bad
banana
bandana
banker
bed
brother
And the search string band, I would expect the results ordered as follows
bandana (index 0-3 matched)
banana (index 0-2 matched)
banker
bad (index 0-1 matched)
bed (index 0 matched)
brother
Note that we only care about the length of the substring matched. The matches that fall into each bucket don't have to be sorted alphabetically, I only care about the bucket they fall into.
So I guess naively the problem involves:
Seeing the length of substring match against my input for each row
Putting each row into the appropriate bucket based on the match length
Ordering the buckets in a descending order, ie (4 chars matched, 3 chars matched, 2..)
But this sounds expensive, so how could I implement this in SQL or C#, and do it efficiently?
Is there a similar problem/pattern I could benefit from here?
Many thanks
Not sure if it is the most efficient way but.
Using a numbers table, split the strings into chars and join this to a split of the search string then just order by count and the string.
DECLARE #t TABLE ( string VARCHAR(50) )
INSERT INTO #t (string)
VALUES
('bad'),
('banana'),
('bandana'),
('banker'),
('bed'),
('brother')
DECLARE #search VARCHAR(50) = 'band'
;WITH numbers AS
(
SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY t1.number) AS n
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
)
SELECT string
FROM #t t
CROSS APPLY (
SELECT SUBSTRING(t.string, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(string)
) s1
JOIN (
SELECT SUBSTRING(#search, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(#search)
) s2 ON s2.c = s1.c
AND s2.n = s1.n
GROUP BY string
ORDER BY COUNT(1) DESC, string
demo
String operations and sql-server is not the best match afaik.
My best bet would be to try a modified version of the Bayer-Moore-horspool to find the number of matching characters. However, on a miss you wouldn't skip the full word length, only the length of the maximum match. Then simply insert into the appriate bucket.

LINQ-to-SQL orderby question

I have a LINQ-to-SQL query, and I order on an nvarchar field called CustomerReference. The problem is, reference's that start with a capital letter seem to be after ones without capitals, when I need this the other way around. For example, if I have the following rows:
d93838
D98484
It is currently ordered in that sequence right now, however I need it reversed - so it'd be like this
D98484
d93838
Any ideas guys? Thanks
This assumes the Format [A-Za-z]\d+ and will put b3432 before C1234 but after B9999
list.OrderBy (l => l.CustomerReference.Substring(0,1).ToLower())
.ThenByDescending(l =>l.CustomerReference.Substring(0,1).ToUpper()==l.CustomerReference.Substring(0,1))
.ThenBy (l =>l.CustomerReference )
EDIT: I was asked for the SQL too so this is what LINQPad does
-- Region Parameters
DECLARE #p0 Int SET #p0 = 0
DECLARE #p1 Int SET #p1 = 1
DECLARE #p2 Int SET #p2 = 0
DECLARE #p3 Int SET #p3 = 1
DECLARE #p4 Int SET #p4 = 0
DECLARE #p5 Int SET #p5 = 1
-- EndRegion
SELECT [T0].CustomerReference FROM [dbo].[test] AS [t0]
ORDER BY LOWER(SUBSTRING([t0].[CustomerReference], #p0 + 1, #p1)),
(CASE
WHEN UPPER(SUBSTRING([t0].[CustomerReference], #p2 + 1, #p3)) = SUBSTRING([t0].[CustomerReference], #p4 + 1, #p5) THEN 1
WHEN NOT (UPPER(SUBSTRING([t0].[CustomerReference], #p2 + 1, #p3)) = SUBSTRING([t0].[CustomerReference], #p4 + 1, #p5)) THEN 0
ELSE NULL
END) DESC, [t0].[CustomerReference]
In most implementations, lower-case comes first (not least, that is how code-points are arranged ordinally). You won't be able to get SQL server to change that, so the next best thing is to bring it back unsorted, and write a custom comparer. Note that the inbuilt .NET comparers will also treat lower-case as either first or equal (compared to their upper-case equivalent), depending on the comparer.
However! Unless you limit yourself to very simple examples (ASCII etc), ordering "alike" characters is a very non-trivial exercise. Even if we ignore the Turkish I / İ / ı / i, accented characters are going to cause you problems).

How can I rearrange string with SQL?

Declare #CustTotalCount as int
Declare #CustMatchCount as int
select #CustTotalCount = count(*) from ENG_CUSTOMERTALLY
select #CustMatchCount = count(*) from Task where MPDReference in(
select ENG_CUSTOMERTALLY_CUSTOMERTASKNUMBER from dbo.ENG_CUSTOMERTALLY)
if(#CustTotalCount>#CustMatchCount)
select distinct
substring(ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO, charindex('-', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
+ 1, 1000)
from dbo.ENG_CUSTOMERMYCROSS where
ENG_CUSTOMERMYCROSS_CUSTOMER_NUMBER in(
select ENG_CUSTOMERTALLY_CUSTOMERTASKNUMBER from ENG_CUSTOMERTALLY1
except
select MPDReference from Task )
I can convert
- A320-200001-01-1(1)
- A320-200001-02-1(2)
- A320-200001-01-1(2)
- A320-200001-01-1(1)
- A320-200001-01-1(2)
- A320-200001-02-1(1)
TO
- 200001-01-1(1)
- 200001-02-1(2)
- 200001-01-1(2)
- 200001-01-1(1)
- 200001-01-1(2)
- 200001-02-1(1)
But I need to :
- 200001-01-1
- 200001-02-1
- 200001-01-1
- 200001-01-1
- 200001-01-1
- 200001-02-1
How can I do that in SQL and C#?
Is the pattern always the same, if so you could just use SUBSTRING to pull out the bit you want.
EDIT: To take in additional stuff asked in How can i use substring in SQL?
You could
SELECT DISTINCT SUBSTRING(....) FROM ...
as answered above, use the SUBSTRING method like you are but use a length of 11 instead of 1000 as long as the data is always in the format you show above.
In C# it would be:
string s = "A320-20001-01-1(1)";
string result = s.Substring(s.IndexOf('-'), 11);
again this is assuming the part you want is always 11 characters. Otherwise if it is always the first '(' you want to end before, you the IndexOf method/function again to find the end index and subtract the first index
Try substring and len, this sample cuts first 6 and last 4 (4 = 10-6) chars
declare #var varchar(50)
set #var = 'A320-200001-01-1(1)
select substring(#var, 6, len(#var) - 10)
output: 200001-01
In c#, functions are similar, exept zero-based index:
string var = "A320-200001-01-1(1)";
var = var.Substring(5, var.Length - 8);
Console.WriteLine(var);
Here's a technique that uses PATINDEX, which can use wild cards.
SUBSTRING(ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO,
PATINDEX('%[0-9]%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO),
PATINDEX('%(%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
- PATINDEX('%[0-9]%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
)
The start for your substring is the position of the first numeric value (%[0-9]%). The length value is the position of the first parenthesis ('%(%') less the starting position.

Categories

Resources