LINQ-to-SQL orderby question - c#

I have a LINQ-to-SQL query, and I order on an nvarchar field called CustomerReference. The problem is, reference's that start with a capital letter seem to be after ones without capitals, when I need this the other way around. For example, if I have the following rows:
d93838
D98484
It is currently ordered in that sequence right now, however I need it reversed - so it'd be like this
D98484
d93838
Any ideas guys? Thanks

This assumes the Format [A-Za-z]\d+ and will put b3432 before C1234 but after B9999
list.OrderBy (l => l.CustomerReference.Substring(0,1).ToLower())
.ThenByDescending(l =>l.CustomerReference.Substring(0,1).ToUpper()==l.CustomerReference.Substring(0,1))
.ThenBy (l =>l.CustomerReference )
EDIT: I was asked for the SQL too so this is what LINQPad does
-- Region Parameters
DECLARE #p0 Int SET #p0 = 0
DECLARE #p1 Int SET #p1 = 1
DECLARE #p2 Int SET #p2 = 0
DECLARE #p3 Int SET #p3 = 1
DECLARE #p4 Int SET #p4 = 0
DECLARE #p5 Int SET #p5 = 1
-- EndRegion
SELECT [T0].CustomerReference FROM [dbo].[test] AS [t0]
ORDER BY LOWER(SUBSTRING([t0].[CustomerReference], #p0 + 1, #p1)),
(CASE
WHEN UPPER(SUBSTRING([t0].[CustomerReference], #p2 + 1, #p3)) = SUBSTRING([t0].[CustomerReference], #p4 + 1, #p5) THEN 1
WHEN NOT (UPPER(SUBSTRING([t0].[CustomerReference], #p2 + 1, #p3)) = SUBSTRING([t0].[CustomerReference], #p4 + 1, #p5)) THEN 0
ELSE NULL
END) DESC, [t0].[CustomerReference]

In most implementations, lower-case comes first (not least, that is how code-points are arranged ordinally). You won't be able to get SQL server to change that, so the next best thing is to bring it back unsorted, and write a custom comparer. Note that the inbuilt .NET comparers will also treat lower-case as either first or equal (compared to their upper-case equivalent), depending on the comparer.
However! Unless you limit yourself to very simple examples (ASCII etc), ordering "alike" characters is a very non-trivial exercise. Even if we ignore the Turkish I / İ / ı / i, accented characters are going to cause you problems).

Related

TSQL MD5 generation with UTF8

I have a .NET function MD5 that when run on "146.185.59.178acu-cell.com" it returns f36674ed3dbcb151e1c0dfe4acdbb9f5
public static String MD5(String s)
{
using (var provider = System.Security.Cryptography.MD5.Create())
{
StringBuilder builder = new StringBuilder();
foreach (Byte b in provider.ComputeHash(Encoding.UTF8.GetBytes(s)))
builder.Append(b.ToString("x2").ToLower());
return builder.ToString();
}
}
I wrote the same code in TSQL, but for some reason only the varchar returns the expected result. The nvarchar returns a different md5 : f04b83328560f1bd1c08104b83bc30ea
declare #v varchar(150) = '146.185.59.178acu-cell.com'
declare #nv nvarchar(150) = '146.185.59.178acu-cell.com'
select LOWER(CONVERT(VARCHAR(32), HashBytes('MD5', #v), 2))
--f36674ed3dbcb151e1c0dfe4acdbb9f5
select LOWER(CONVERT(VARCHAR(32), HashBytes('MD5',#nv), 2))
--f04b83328560f1bd1c08104b83bc30ea
Not sure what is going on here because I do expect for the nvarchar to return f36674ed3dbcb151e1c0dfe4acdbb9f5 as it does in .NET
You get different hashes because the binary representation of the text is different. The following query demonstrates this:
declare #v varchar(150) = '146.185.59.178acu-cell.com'
declare #nv nvarchar(150) = '146.185.59.178acu-cell.com'
select convert(varbinary(max), #v) -- 0x3134362E3138352E35392E3137386163752D63656C6C2E636F6D
select convert(varbinary(max), #nv) -- 0x3100340036002E003100380035002E00350039002E003100370038006100630075002D00630065006C006C002E0063006F006D00
The extra 0 bytes for the nvarchar are due to the fact that it's a 2-byte Unicode datatype. Refer to MSDN for more information on Unicode in SQL Server.
Turns out I need to explicitly convert NVarChar to UTF8
Found this code on the net:
CREATE FUNCTION [dbo].[fnUTF8] (
#String NVarChar(max)
) RETURNS VarChar(max) AS BEGIN
DECLARE #Result VarChar(max)
,#Counter Int
,#Len Int
SELECT #Result = ''
,#Counter = 1
,#Len = Len(#String)
WHILE (##RowCount > 0)
SELECT #Result = #Result
+ CASE WHEN Code < 128 THEN ''
WHEN Code < 2048 THEN Char(192 + Code / 64)
ELSE Char(224 + Code / 4096)
END
+ CASE WHEN Code < 128 THEN Char(Code)
WHEN Code < 2048 THEN Char(128 + Code % 64)
ELSE Char(128 + Code / 64 % 64)
END
,#Counter = #Counter + 1
FROM (SELECT UniCode(SubString(#String,#Counter,1)) AS Code) C
WHERE #Counter <= #Len
RETURN #Result
END
GO
And now I use it like this:
select LOWER(CONVERT(VARCHAR(32), HashBytes('MD5', [dbo].[fnUTF8](#nv)), 2))

Efficient SQL Bucket Sort based on Length of Substring Match

Given a SQL database table containing strings indexed alphabetically, how might I perform a search query that orders by substring match?
For example, given the data set:
bad
banana
bandana
banker
bed
brother
And the search string band, I would expect the results ordered as follows
bandana (index 0-3 matched)
banana (index 0-2 matched)
banker
bad (index 0-1 matched)
bed (index 0 matched)
brother
Note that we only care about the length of the substring matched. The matches that fall into each bucket don't have to be sorted alphabetically, I only care about the bucket they fall into.
So I guess naively the problem involves:
Seeing the length of substring match against my input for each row
Putting each row into the appropriate bucket based on the match length
Ordering the buckets in a descending order, ie (4 chars matched, 3 chars matched, 2..)
But this sounds expensive, so how could I implement this in SQL or C#, and do it efficiently?
Is there a similar problem/pattern I could benefit from here?
Many thanks
Not sure if it is the most efficient way but.
Using a numbers table, split the strings into chars and join this to a split of the search string then just order by count and the string.
DECLARE #t TABLE ( string VARCHAR(50) )
INSERT INTO #t (string)
VALUES
('bad'),
('banana'),
('bandana'),
('banker'),
('bed'),
('brother')
DECLARE #search VARCHAR(50) = 'band'
;WITH numbers AS
(
SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY t1.number) AS n
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
)
SELECT string
FROM #t t
CROSS APPLY (
SELECT SUBSTRING(t.string, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(string)
) s1
JOIN (
SELECT SUBSTRING(#search, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(#search)
) s2 ON s2.c = s1.c
AND s2.n = s1.n
GROUP BY string
ORDER BY COUNT(1) DESC, string
demo
String operations and sql-server is not the best match afaik.
My best bet would be to try a modified version of the Bayer-Moore-horspool to find the number of matching characters. However, on a miss you wouldn't skip the full word length, only the length of the maximum match. Then simply insert into the appriate bucket.

SQL/C# - Primary Key error on UPSERT

UPDATE(simplified problem, removed C# from the issue)
How can I write an UPSERT that can recognize when two rows are the same in the following case...
See how there's a \b [backspace] encoded there (the weird little character)? SQL sees these as the same. While my UPSERT sees this as new data and attempts an INSERT where there should be an UPDATE.
//UPSERT
INSERT INTO [table]
SELECT [col1] = #col1, [col2] = #col2, [col3] = #col3, [col4] = #col4
FROM [table]
WHERE NOT EXISTS
-- race condition risk here?
( SELECT 1 FROM [table]
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3)
UPDATE [table]
SET [col4] = #col4
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3
You need the # sign, otherwise a C# character escape sequence is hit.
C# defines the following character escape sequences:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 - Unicode character 0
\a - Alert (character 7)
\b - Backspace (character 8)
\f - Form feed (character 12)
\n - New line (character 10)
\r - Carriage return (character 13)
\t - Horizontal tab (character 9)
\v - Vertical quote (character 11)
\uxxxx - Unicode escape sequence for character with hex value xxxx
\xn[n][n][n] - Unicode escape sequence for character with hex value nnnn (variable length version of \uxxxx)
\Uxxxxxxxx - Unicode escape sequence for character with hex value xxxxxxxx (for generating surrogates)
After hours of tinkering it turns out I've been on a wild goose chase. The problem is very simple. I pulled my UPSERT from a popular SO post. The code is no good. The select will sometimes return > 1 rows on INSERT. Thereby attempting to insert a row, then insert the same row again.
The fix is to remove FROM
//UPSERT
INSERT INTO [table]
SELECT [col1] = #col1, [col2] = #col2, [col3] = #col3, [col4] = #col4
--FROM [table] (Dont use FROM..not a race condition, just a bad SELECT)
WHERE NOT EXISTS
( SELECT 1 FROM [table]
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3)
UPDATE [table]
SET [col4] = #col4
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3
Problem is gone.
Thanks to all of you.
You are using '\u' which generates a Unicode character.
Your column is a varchar, which does not support Unicode characters. nvarchar would support the character.

c# ip address generator to SQL

I have the following python code I found on the internet, I would like to make a table in a SQL database with every ipv4 address that there is. I dont code in python but its what I found.
My question is
1: Is there T-SQL code I can use to generate the table ? (one column ie 0.0.0.0-255.255.255.255)
2: Is how would I make this in c#? using the fastest method possible ? I know showing the results slows the console application down by 400 %
#!/usr/bin/env python
def generate_every_ip_address():
for octet_1 in range( 256 ):
for octet_2 in range( 256 ):
for octet_3 in range( 256 ):
for octet_4 in range( 256 ):
yield "%d.%d.%d.%d" % (octet_1, octet_2, octet_3, octet_4)
for ip_address in generate_every_ip_address():
print ip_address
Would this work?
DECLARE #a INTEGER
DECLARE #b INTEGER
DECLARE #c INTEGER
DECLARE #d INTEGER
DECLARE #IPADDRESS nvarchar(50)
set #a = 0
WHILE #a < 256
BEGIN
SET #b = 0
WHILE #b < 256
BEGIN
SET #c = 0
WHILE #c < 256
BEGIN
SET #d = 0
WHILE #d < 256
BEGIN
SET #IPADDRESS = CAST(#a AS nvarchar(3)) + '.' + CAST(#b AS nvarchar(3)) + '.' + CAST(#c AS nvarchar(3)) + '.' + CAST(#d AS nvarchar(3))
PRINT #IPADDRESS
SET #d = #d + 1
END
SET #c = #c + 1
END
SET #b = #b + 1
END
SET #a = #a + 1
END
To insert in batches of 16,581,375 rows would be quite straightforward using the following TSQL.
DECLARE #Counter INT
SET #Counter = 0
SET NOCOUNT ON ;
WHILE ( #Counter <= 255 )
BEGIN
RAISERROR('Procesing %d' ,0,1,#Counter) WITH NOWAIT ;
WITH Numbers ( N )
AS ( SELECT CAST(number AS VARCHAR(3))
FROM master.dbo.spt_values
WHERE type = 'P'
AND number BETWEEN 0 AND 255
)
INSERT INTO YourTable
( IPAddress
)
SELECT #Counter + '.' + N1.N + '.' + N2.N + '.' + N3.N
FROM Numbers N1 ,
Numbers N2 ,
Numbers N3
SET #Counter = #Counter + 1
END
Please just use an int IDENTITY column to store each IP address. They're only 32 bits. Fill your table up with whatever else you're storing.

How can I rearrange string with SQL?

Declare #CustTotalCount as int
Declare #CustMatchCount as int
select #CustTotalCount = count(*) from ENG_CUSTOMERTALLY
select #CustMatchCount = count(*) from Task where MPDReference in(
select ENG_CUSTOMERTALLY_CUSTOMERTASKNUMBER from dbo.ENG_CUSTOMERTALLY)
if(#CustTotalCount>#CustMatchCount)
select distinct
substring(ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO, charindex('-', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
+ 1, 1000)
from dbo.ENG_CUSTOMERMYCROSS where
ENG_CUSTOMERMYCROSS_CUSTOMER_NUMBER in(
select ENG_CUSTOMERTALLY_CUSTOMERTASKNUMBER from ENG_CUSTOMERTALLY1
except
select MPDReference from Task )
I can convert
- A320-200001-01-1(1)
- A320-200001-02-1(2)
- A320-200001-01-1(2)
- A320-200001-01-1(1)
- A320-200001-01-1(2)
- A320-200001-02-1(1)
TO
- 200001-01-1(1)
- 200001-02-1(2)
- 200001-01-1(2)
- 200001-01-1(1)
- 200001-01-1(2)
- 200001-02-1(1)
But I need to :
- 200001-01-1
- 200001-02-1
- 200001-01-1
- 200001-01-1
- 200001-01-1
- 200001-02-1
How can I do that in SQL and C#?
Is the pattern always the same, if so you could just use SUBSTRING to pull out the bit you want.
EDIT: To take in additional stuff asked in How can i use substring in SQL?
You could
SELECT DISTINCT SUBSTRING(....) FROM ...
as answered above, use the SUBSTRING method like you are but use a length of 11 instead of 1000 as long as the data is always in the format you show above.
In C# it would be:
string s = "A320-20001-01-1(1)";
string result = s.Substring(s.IndexOf('-'), 11);
again this is assuming the part you want is always 11 characters. Otherwise if it is always the first '(' you want to end before, you the IndexOf method/function again to find the end index and subtract the first index
Try substring and len, this sample cuts first 6 and last 4 (4 = 10-6) chars
declare #var varchar(50)
set #var = 'A320-200001-01-1(1)
select substring(#var, 6, len(#var) - 10)
output: 200001-01
In c#, functions are similar, exept zero-based index:
string var = "A320-200001-01-1(1)";
var = var.Substring(5, var.Length - 8);
Console.WriteLine(var);
Here's a technique that uses PATINDEX, which can use wild cards.
SUBSTRING(ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO,
PATINDEX('%[0-9]%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO),
PATINDEX('%(%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
- PATINDEX('%[0-9]%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
)
The start for your substring is the position of the first numeric value (%[0-9]%). The length value is the position of the first parenthesis ('%(%') less the starting position.

Categories

Resources