My data has some duplicate records in only a single column. I want to filter them after running the data through a script component to take all duplicate values and append incremental numbers to them so they are unique.
Is it possible to do with with an Aggregate Component?
For example, my data may look like this:
Column1 and 2 are used as my primary Keys, so I need Column2 to be more unique with it's values.
After Appending numbers to the duplicates, it would look like this (notice 'C' does not have a number):
select tt.*, tt.col2 + '.' + rn
from ( select t.*
, row_number() over (partition by col2 order by ?) as rn
, count(*) over (partition by col2) as cnt
) tt
I noticed C does not have a number. I will leave that exercise to you. Hint use cnt.
DECLARE #a TABLE (col2 varchar(20));
INSERT INTO #a VALUES ('a'), ('a') , ('a'), ('b'), ('c'), ('c');
select aa.*, aa.col2 + '.' + cast(rn as varchar)
from ( select a.*
, row_number() over (partition by col2 order by col2) as rn
, count(*) over (partition by col2) as cnt
from #a a
) aa
where aa.cnt > 1
order by aa.col2;
update aa
set aa.col2 = aa.col2 + '.' + cast(rn as varchar)
from ( select a.*
, row_number() over (partition by col2 order by col2) as rn
, count(*) over (partition by col2) as cnt
from #a a
) aa
where aa.cnt > 1;
select * from #a a
order by a.col2;
Related
How to compare a string delimited string to a column value in sql without considering sequence?
Suppose I have a value in sql column [fruits] - mango, apple, cherry... I have list in asp.net C# cherry, mango, apple... I want to write sql query such that it can match sql table without order.
I suggest that you look at the fabulous answers in this SO question
How to split a comma-separated value to columns
That said, your solution should be pass each column which contains words to this function and then store it in a table along with a column ID.
So "mango,apple,cherry" becomes a table with values
ColdID Value
_______________
1 mango
1 apple
1 cherry
Now order the tables by ColID ASC, Value ASC and compare both the tables.
This should do it.
DECLARE #str NVARCHAR(MAX)
, #Delim NVARCHAR(255)
SELECT #str = 'cherry,mango,peach,apple'
SELECT #Delim = ','
CREATE TABLE #Fruits ( Fruit VARCHAR(255) )
INSERT INTO #Fruits
( Fruit )
VALUES ( 'cherry' ),
( 'Mango' ),
( 'Apple' ) ,
( 'Banana' )
;WITH lv0 AS (SELECT 0 g UNION ALL SELECT 0)
,lv1 AS (SELECT 0 g FROM lv0 a CROSS JOIN lv0 b) -- 4
,lv2 AS (SELECT 0 g FROM lv1 a CROSS JOIN lv1 b) -- 16
,lv3 AS (SELECT 0 g FROM lv2 a CROSS JOIN lv2 b) -- 256
,lv4 AS (SELECT 0 g FROM lv3 a CROSS JOIN lv3 b) -- 65,536
,lv5 AS (SELECT 0 g FROM lv4 a CROSS JOIN lv4 b) -- 4,294,967,296
,Tally_CTE (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM lv5)
SELECT SUBSTRING(#str, N, CHARINDEX(#Delim, #str + #Delim, N) - N) AS Item
INTO #StrTable
FROM Tally_CTE
WHERE N BETWEEN 1 AND DATALENGTH(#str) + DATALENGTH(#Delim)
AND SUBSTRING(#Delim + #str, N, LEN(#Delim)) = #Delim;
--#############################################################################
-- in both
--#############################################################################
SELECT *
FROM #Fruits F
JOIN #StrTable ST ON F.Fruit = ST.Item
--#############################################################################
-- in table but not string
--#############################################################################
SELECT *
FROM #Fruits F
LEFT JOIN #StrTable ST ON ST.Item = F.Fruit
WHERE ST.Item IS NULL
--#############################################################################
-- in string but not table
--#############################################################################
SELECT *
FROM #StrTable ST
LEFT JOIN #Fruits F ON ST.Item = F.Fruit
WHERE F.Fruit IS NULL
GO
DROP TABLE #Fruits
DROP TABLE #StrTable
You can use string_split function to do this. I tested this on SQL Server 2017 ctp 2.0 but it should work on 2016 too.
drop table if exists dbo.Fruits;
create table dbo.Fruits (
Fruits varchar(100)
);
insert into dbo.Fruits (Fruits)
values ('cherry,mango,apple'), ('peanut,cherry,mango'),
('apple,cherry,mango')
declare #str varchar(100) = 'apple,mango,cherry';
select
tt.Fruits
, COUNT(tt.value) as Value01
, COUNT(app.value) as Value02
from (
select
*
from dbo.Fruits f
outer apply string_split (f.Fruits, ',') t
) tt
left join string_split (#str, ',') app on tt.value = app.value
group by tt.Fruits
I would like to transpose the data from my table and do some plottings into powerBI.
Here is how I feel up my database from my application:
using (SqlCommand cmd = connect.CreateCommand())
{
cmd.CommandText = #"INSERT INTO PoD_NewPriceList_Data
(ID, Product_Barcode, Product_Name,
Store_Price, Internet_Price, InsertDate)
VALUES (#ID, #Product_Barcode, #Product_Name,
#Store_Price, #Internet_Price, #InsertDate)";
cmd.Parameters.Add("Product_Barcode", SqlDbType.NVarChar).Value = barcode;
cmd.Parameters.Add("Product_Name", SqlDbType.NVarChar).Value = PriceList.name;
cmd.Parameters.Add("Store_Price", SqlDbType.Float).Value = Convert.ToDouble(storePrice, CultureInfo.InvariantCulture);
cmd.Parameters.Add("Internet_Price", SqlDbType.Float).Value = Convert.ToDouble(PriceList.price, CultureInfo.InvariantCulture);
cmd.Parameters.Add("InsertDate", SqlDbType.DateTime).Value = InsertDate.AddDays(2);
cmd.Parameters.Add("ID", SqlDbType.Int).Value = barcode.GetHashCode();
result = result && (cmd.ExecuteNonQuery() > 0);
}
And in SQL Server Management Studio here is how my table looks like:
SELECT
[ID], [Product_Barcode], [Product_Name],
[Store_Price], [Internet_Price], [InsertDate]
FROM
[dbo].[PoD_NewPriceList_Data]
and I get the following output:
The main issue is when trying to create the plots as requested in PowerBI I need my data to look as follows:
F5321
Product_Name Sony Xperia...
Store_Price 399
Internet_Price 327.51
InsertDate 2017.04.27
Any help would be well appreciated.
Check and modify this SQL script. I use #t table variable, replace it with your table name [PoD_NewPriceList_Data].
DECLARE #t TABLE (
id int,
product_barcode varchar(max),
product_name varchar(max),
store_price int,
internet_price decimal,
insert_date date
)
INSERT INTO #t VALUES (1,'F5321', 'Sony Xperia', 399, 255.1, '2017-04-25')
INSERT INTO #t VALUES (2,'F5833', 'Sony Xperia XZ', 458, 398.2, '2017-04-26')
INSERT INTO #t VALUES (3,'F5121', 'Sony Xperia XA Rose', 161, 155.6, '2017-04-27')
IF OBJECT_ID ('tempdb..#Unpivoted') IS NOT NULL
DROP TABLE #Unpivoted
IF OBJECT_ID ('tempdb..#Transposed') IS NOT NULL
DROP TABLE #Transposed
/* Unpivot table to get rows instead of columns */
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT 0)) as rn
INTO #Unpivoted
FROM (SELECT product_barcode, product_name,
CAST(store_price as varchar(max)) store_price,
CAST(internet_price as varchar(max)) internet_price,
CAST(insert_date as varchar(max)) as insert_date
FROM #t) src
UNPIVOT (
value FOR field IN (
product_barcode, product_name, store_price, internet_price, insert_date
)
) unpiv
CREATE TABLE #Transposed
(Field varchar(50) PRIMARY KEY NOT NULL )
DECLARE #SQL NVARCHAR(MAX)
SELECT #SQL = STUFF((
SELECT 'ALTER TABLE #Transposed ADD item' +
RIGHT('000' + CAST(sv.number AS VARCHAR(3)), 3) + ' varchar(max) '
FROM [master].dbo.spt_values sv
WHERE sv.[type] = 'p'
AND sv.number BETWEEN 1 AND (SELECT COUNT(*) FROM #t)
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 0, '')
Exec(#SQL) /* Dynamically create columns */
INSERT INTO #Transposed (Field) SELECT DISTINCT Field FROM #Unpivoted
/*populate field names*/
DECLARE #fieldCount int = (SELECT COUNT(*) FROM #Transposed)
/* using rn to filter proper record from transposed table */
SELECT #SQL = STUFF((
SELECT '
UPDATE #Transposed SET item' + RIGHT('000' + CAST(sv.number AS VARCHAR(3)), 3)
+ ' = up.value FROM #Transposed t CROSS APPLY
( SELECT TOP 1 u.value FROM #unpivoted u WHERE u.field = t.field AND u.rn > '
+ CAST((sv.number-1)*#fieldCount AS VARCHAR(10)) + ' ORDER BY rn) up '
FROM [master].dbo.spt_values sv
WHERE sv.[type] = 'p'
AND sv.number BETWEEN 1 AND (SELECT COUNT(*) FROM #t)
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 0, '')
Exec(#SQL) /*Dynamically fill in values */
SELECT t.* FROM #Transposed t
OUTER APPLY (SELECT TOP 1 rn FROM #Unpivoted u WHERE u.field=t.field) up
ORDER BY up.rn ASC /* add a link to Unpivoted to fix the item order */
DROP TABLE #Unpivoted
DROP TABLE #Transposed
It does what you need in several steps
converts columns to rows with UNPIVOT. Watch that you have to CAST all the values to the exactly same type. Adds a row number to filter the rows in step 3.
creates a temp table with dynamic number of columns corresponding to the number of rows
fills in the columns names into rows into the dynamically created table
fills in values into the dynamically created table
Credits to this answer and this answer.
Of course the number of columns is limited here, so if you try to convert many rows into columns, you get:
Cannot create a row of size 8066 which is greater than the allowable
maximum row size of 8060.
I've got a stored procedure which joins a number of tables to produce a large resultset which is then returned to my application. The application in turn loops through the results and combines rows on a particular ID and chooses data per row to include in a new object. This is perhaps easiest to explain using an example:
Inspection, Desc, Value
1, Description1, 3
1, Description2, 2
1, Description3, 5
This is in code turned into
Inspection, Description1, Description2, Description3
1, 3, 2, 5
The point of this is to have one row per inspection item with item description as headers and value as the cell value for inspection row and header. This is then exported to Excel.
The question is: how do I do this in SQL Server, as in expanding my SP to return a lot fewer but "wider" rows with a lot more columns?
Another complication is that one inspection may have rows which another one lacks, in that case the solution is to add an empty value or a '-'.
P.S. This is using Sql Server 2012.
If you are using mssql 2005+. You can use a pivot like this:
Test data
DECLARE #tbl TABLE(Inspection INT, [Desc] VARCHAR(100),Value INT)
INSERT INTO #tbl
VALUES
(1,'Description1', 3),
(1,'Description2', 2),
(1,'Description3', 5)
Query
SELECT
*
FROM
(
SELECT
tbl.Inspection,
tbl.[Desc],
tbl.Value
FROM
#tbl AS tbl
) AS tbl
PIVOT
(
SUM(Value)
FOR [Desc] IN ([Description1],[Description2],[Description3])
)AS pvt
Result:
Inspection, Description1, Description2, Description3
1 3 2 5
Edit
As juharr said in the comment:
The resulting column names (values in the table) are when building the query. Which might require another initial query to get
Edit 2
If you are not using mssql 2005+. Or want to have and alternitive explanation. Please see the following query:
SELECT
tbl.Inspection,
SUM(CASE WHEN [Desc]='Description1' THEN tbl.Value ELSE 0 END) AS Description1,
SUM(CASE WHEN [Desc]='Description2' THEN tbl.Value ELSE 0 END) AS Description2,
SUM(CASE WHEN [Desc]='Description3' THEN tbl.Value ELSE 0 END) AS Description3
FROM
#tbl AS tbl
GROUP BY
tbl.Inspection
This do not requiere a pivot and can be use on most of RDMS out there
You should use Sql Server Pivot. It converts rows into columns. You can have an easiest start by this example.
If you'd like to do this dynamically, without having to know what all of the Desc values are, you can build your pivot query and use Exec() or Execute sp_executesql
DECLARE #Columns NVARCHAR(MAX),
#Sql NVARCHAR(MAX)
--Build your column headers based on Distinct Desc values
SELECT #Columns = COALESCE(#Columns + ',', '') + QUOTENAME([Desc])
FROM (SELECT DISTINCT [Desc] FROM tbl) t
ORDER BY [Desc]
--Build your pivot query
SET #Sql = '
SELECT
*
FROM
tbl
PIVOT
(
MAX([Value])
FOR [Desc] IN (' + #Columns + ')
) p
'
EXEC(#Sql)
If you want - for null values, you'll need to create another variable to hold the conversion scripts for the Select part of your sql.
DECLARE #Columns NVARCHAR(MAX),
#Sql NVARCHAR(MAX),
#ColumnAliases NVARCHAR(MAX)
--Build your pivot columns based on Distinct Desc values
SELECT #Columns = COALESCE(#Columns + ',', '') + QUOTENAME([Desc])
FROM (SELECT DISTINCT [Desc] FROM tbl) t
ORDER BY [Desc]
--Build your column headers, replacing NULL with -
SELECT #ColumnAliases = COALESCE(#ColumnAliases + ',', '')
+ 'COALESCE(CONVERT(VARCHAR,' + QUOTENAME([Desc]) + '),''-'') AS ' + QUOTENAME([Desc])
FROM (SELECT DISTINCT [Desc] FROM tbl) t
ORDER BY [Desc]
--Build your pivot query
SET #Sql = '
SELECT
Inspection,'
+ #ColumnAliases + '
FROM
tbl
PIVOT
(
MAX([Value])
FOR [Desc] IN (' + #Columns + ')
) p
'
EXEC(#Sql)
Figure 01 is a table in database, and I want to extract the data as shown in Figure 02.
Which query should I use?
Unique elements in Col_1 should become the column name for new table and elements in Col_2 should become the values as shown in Figure 02.
You can use the PIVOT function along with row_number() to get the result:
select A, B
from
(
select col_1, col_2,
row_number() over(partition by col_1 order by col_2) rn
from yourtable
) d
pivot
(
max(col_2)
for col_1 in (A, B)
) piv;
See SQL Fiddle with Demo.
Or you can use an aggregate function with a CASe expression to convert the rows into columns:
select
max(case when col_1 = 'A' then col_2 end) A,
max(case when col_1 = 'B' then col_2 end) B
from
(
select col_1, col_2,
row_number() over(partition by col_1 order by col_2) rn
from yourtable
) d
group by rn;
See SQL Fiddle with Demo
My method of paging is inefficient as it calls the same query twice therefore doubling the query time. I currently call the 1 query that joins about 5 tables together with XML search querys to allow for passing List from ASP.net.. then I need to call exactly the same query except with a Count(row) to get the amount of records
For Example (I have removed bits to make it easier to read)
Main Query:
WITH Entries AS (
select row_number() over (order by DateReady desc)
as rownumber, Columns...,
from quote
join geolookup as Pickup on pickup.geoid = quote.pickupAddress
where
quote.Active=1
and //More
)
select * from entries
where Rownumber between (#pageindex - 1) * #pagesize + 1 and #pageIndex * #pageSize
end
Count Query:
select count(rowID)
from quote
join geolookup as Pickup on pickup.geoid = quote.pickupAddress
where
quote.Active=1
and //More
)
You could select the results of your big query into a temp table, then you could query this table for the row number and pull out the rows you need.
To do this, add (after your select statement and before the from)
INTO #tmpTable
Then reference your table as #tmpTable
select row_number() over (order by DateReady desc)
as rownumber, Columns...,
into #tmpTable
from quote
join geolookup as Pickup on pickup.geoid = quote.pickupAddress
where
quote.Active=1
and //More
)
SELECT #Count = COUNT(*) FROM #tmpTable
select * from #tmpTable
where Rownumber between (#pageindex - 1) * #pagesize + 1 and #pageIndex * #pageSize
You can set an output parameter which will hold the number of rows from the first query.
You could do something like
WITH Entries AS (
select row_number() over (order by DateReady desc)
as rownumber, Columns...,
from quote
join geolookup as Pickup on pickup.geoid = quote.pickupAddress
where
quote.Active=1
and //More
)
select #rowcount = max(rownumber) from entries
select * from entries
where Rownumber between (#pageindex - 1) * #pagesize + 1 and #pageIndex * #pageSize
Hope this helps