Select unique random rows from SQL Server table but always duplicates

Select unique random rows from SQL Server table but always duplicates - c#

i am not the best with sql but i try my best to get my Problems done. I have a table "just" which is filled with Columns ( ID (PK, Identity), Char, Serv , Random ) . Now i want to select a random row from this table and insert it into the Table "f_WinGet". So far all my Procedures take this Step fine , but i always get Duplicates in the second table.
First table : 84 rows
Second table: needed 35 random out of 84.
I have tried many other ways but i always get the same result. All my Procedure for random are binded to a Button i a C# Programm. All is working fine so far , but i always have some Duplicates of Rows in my Table.
INSERT INTO f_TWinGet
SELECT TOP 1 Percent Char, Serv, Random
FROM ( select distinct Char, Serv, Random from dbo.just) as derived1
ORDER BY CHECKSUM(NEWID())
It would be nice , if anyone hase an Idea how i can fix my Problem. I am still trying , but all what i get are always the same result.

It you want to insert 35 rows, do it all at once:
INSERT INTO f_TWinGet(char, serv, random)
SELECT TOP 35 Char, Serv, Random
FROM (select distinct Char, Serv, Random
from dbo.just
) derived1
ORDER BY CHECKSUM(NEWID());
If you really want to do them one at a time, I would suggest using not exists:
INSERT INTO f_TWinGet(char, serv, random)
SELECT TOP 1 Char, Serv, Random
FROM (select distinct Char, Serv, Random
from dbo.just
) d
WHERE not exists (select 1 from f_TWinGet f where t.char = f.char and t.serv = f.serv and t.random = f.random)
ORDER BY CHECKSUM(NEWID());
Note that char is a reserved word, so it should be in square braces. I am leaving the names you have have them in your query.

With a table as small as yours you can use something like:
INSERT INTO f_TWinGet
SELECT TOP 1 j.Char, j.Serv, j.Random
FROM dbo.just j
LEFT JOIN f_TWinGet f
ON f.Char = j.Char
AND j.Serv = f.Serv
AND j.Random = f.Random
WHERE f.Char IS NULL
ORDER BY NEWID()
This way making sure that the values you're trying to insert is not on the final table.

Related

Update all rows in sql table with unique random value without using primary key or unique key in c#

In my application, I fetch all tables in Database.
User will select table name and colum names to be masked.
Now i want to update sql table-columns with random generate string , which must be unique for each row without using primary key or unique key.
For example, In my Employeedb i have a table Employee.
Out of columns in Employee table, i want to mask data in name and city columns.
If table conatins 1000 rows, i want change name and city columns with 1000 unique values each. That means i want to update row by row.
Name Address City
Raghav flatno34 mumbai
Ranveer flatno23 chennai
This is orignal data
Name Adress City
Sbgha flatno34 mmjgujj
Lkhhvh flatno23 huughh
This is expected out
The table have primarykey sometimes.. There may be chances of not having primary key.
I have one more qn, I have this expected output in a datatable. Since i cannot predefine the table name and number of fields how will i write an update qry.

I think you will find my blog post entitled How to pre-populate a random strings pool very helpful for this requirement.
(Inspired by this SO answer from Martin Smith, to give credit where credit is due)
It describes an inline table valued user defined function that generates a table of random values, which you can use to update your data.
However, it does not guarantee uniqueness of these values. For that, you must use DISTINCT when selecting from it.
One problem you might encounter because of that is having a result with less values than you generated, but for 1,000 records per table as you wrote in the question it's probably not going to be a problem, since the function can generate up to 1,000,000 records each time you call it.
For the sake of completeness, I'll post the code here as well, but you should probably read the post at my blog.
Also, there's another version of this function in another blog post entitled A more controllable random string generator function for SQL Server - which gives you better control over the content of the random strings - i.e a string containing only numbers, or only lower digits.
The first thing you need to do is create a view that will generate a new guid for you, because this can't be done inside a user-defined function:
CREATE VIEW GuidGenerator
AS
SELECT Newid() As NewGuid
Then, the function code: (Note: this is the simpler version)
CREATE FUNCTION dbo.RandomStringGenerator
(
#Length int,
#Count int -- Note: up to 1,000,000 rows
)
RETURNS TABLE
AS
RETURN
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY ##SPID) FROM E3 a, E2 b) --1,000,000
SELECT TOP(#Count) (
SELECT TOP (#Length) CHAR(
-- create a random number from a guid using the GuidGenerator view, mod 3.
CASE Abs(Checksum(NewGuid)) % 3
WHEN 0 THEN 65 + Abs(Checksum(NewGuid)) % 26 -- Random upper case letter
WHEN 1 THEN 97 + Abs(Checksum(NewGuid)) % 26 -- Random lower case letter
ELSE 48 + Abs(Checksum(NewGuid)) % 10 -- Random digit
END
)
FROM Tally As t0
CROSS JOIN GuidGenerator
WHERE t0.n != -t1.n -- Needed for the subquery to get re-evaluated for each row
FOR XML PATH('')
) As RandomString
FROM Tally As t1
Then, you can use it like this to get a distinct random string:
SELECT DISTINCT RandomString
FROM dbo.RandomStringGenerator(50, 5000);

SQL Server : randomize rows (shuffle IDs)

Is there a way to randomize the rows in SQL Server?
I don't want to retrieve the rows in a random manner, I know how to to that.
I want to shuffle the row IDs in the database (ex. ID1 will change to ID27 and ID27 will change to ID1).
I can copy all records to a temporary table, truncate the original table and insert the records back from the temporary table using a parallel loop for randomization.
Is there an easier way to this ?
ID is the identity seed, auto incremented

This sounds like a really strange requirement. Since the id is an identity you can't change that, so you'll have to swap all the other data on the row, which you could probably do with something like this:
select
a.id as old_id,
b.*
into #newdata
from
(
select
id,
row_number() over (order by id) as rn
from
data
) a
join (
select
*,
row_number() over (order by newid()) as rn
from
data
) b on a.rn = b.rn
This creates a temp table with old and new id numbers + all the columns from the table. You could then use to update all the columns for the rows from in the original table using this temp. table.
Can't really recommend doing this, especially if there's a lot of rows. Before doing this you probably should take a table level exclusive lock to the table just in case.

How to insert Serial Number to Unpivoted Column

I have a table in my DB which contains Date and Time separately in columns for a Time Table so for Displaying it as a Single one in the front end I had joined Date and Time Column and Inserted into Temporary table and Unpivoted it,but the Pk_id is same for both the Unpivoted Columns so in the Front end in the Drop down box when I select the item in the Index say at 6 in DDL after a postback occur it will return to Index 1 in DDL.So,is there a way to put Serial number for the Unpivoted columns, My Unpivot Query is,
Select * from
(
Select pk_bid,No_of_batches,Batch1,Batch2,Batch3,Batch4, from #tempbatch
) as p
Unpivot(Batchname for [Batches] in([Batch1],[Batch2],[Batch3],[Batch4])) as UnPvt
In the above query pk_bid & No_of_Batches is same so If I put Rownumber() Partition by pk_bid Order by pk_bid or Rownumber() Partition by No_of_Batches Order by No_of_Batches it gives the 1,1 only as it is same.

I had solved My above Problem like this,
I had created another Temporary table and created Serial Number with the column in that table with differant values the Query I had done is,
Create Table #Tempbatch2
(
pk_bid int,
No_of_batches int,
Batchname Varchar(max),
[Batches] Varchar(max)
)
Insert Into #Tempbatch2
Select * from
(
Select pk_batchid,No_of_batches,Batch1,Batch2,Batch3,Batch4 from #tempbatch
) as p
Unpivot(Batchname for [Batches] in([Batch1],[Batch2],[Batch3],[Batch4])) as UnPvt
Select Row_number() OVER(ORDER BY (Batchaname)) as S_No,pk_bid,No_of_batches,Batchname,[Batches] from #Tempbatch2

Passing multiple rows of data to a stored procedure

I have a list of objects (created from several text files) in C#.net that I need to store in a SQL2005 database file. Unfortunately, Table-Valued Parameters began with SQL2008 so they won't help. I found from MSDN that one method is to "Bundle multiple data values into delimited strings or XML documents and then pass those text values to a procedure or statement" but I am rather new to stored procedures and need more help than that. I know I could create a stored procedure to create one record then loop through my list and add them, but that's what I'm trying to avoid. Thanks.
Input file example (Other files contain pricing and availability):
Matnr ShortDescription LongDescription ManufPartNo Manufacturer ManufacturerGlobalDescr GTIN ProdFamilyID ProdFamily ProdClassID ProdClass ProdSubClassID ProdSubClass ArticleCreationDate CNETavailable CNETid ListPrice Weight Length Width Heigth NoReturn MayRequireAuthorization EndUserInformation FreightPolicyException
10000000 A&D ENGINEERING SMALL ADULT CUFF FOR UA-767PBT UA-279 A&D ENGINEERING A&D ENG 093764011542 GENERAL General TDINTERNL TD Internal TDINTERNL TD Internal 2012-05-13 12:18:43 N 18.000 .350 N N N N
10000001 A&D ENGINEERING MEDIUM ADULT CUFF FOR UA-767PBT UA-280 A&D ENGINEERING A&D ENG 093764046070 GENERAL General TDINTERNL TD Internal TDINTERNL TD Internal 2012-05-13 12:18:43 N 18.000 .450 N N N N
Some DataBase File fields:
EffectiveDate varchar(50)
MfgName varchar(500)
MfgPartNbr varchar(500)
Cost varchar(200)
QtyOnHand varchar(200)

You can split multiple values from a single string quite easily. Say you can bundle the string like this, using a comma to separate "columns", and a semi-colon to separate "rows":
foo, 20120101, 26; bar, 20120612, 32
(This assumes that colons and semi-colons can't appear naturally in the data; if they can, you'll need to choose other delimiters.)
You can build a split routine like this, which includes an output column that allows you to determine the order the value appeared in the original string:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
Then you can query it like this (for simplicity and illustration I'm only handling 3 properties but you can extrapolate this for 11 or n):
DECLARE #x NVARCHAR(MAX); -- a parameter to your stored procedure
SET #x = N'foo, 20120101, 26; bar, 20120612, 32';
;WITH x AS
(
SELECT ID = s.Number, InnerID = y.Number, y.Item
-- parameter and "row" delimiter here:
FROM dbo.SplitStrings(#x, ';') AS s
-- output and "column" delimiter here:
CROSS APPLY dbo.SplitStrings(s.Item, ',') AS y
)
SELECT
prop1 = x.Item,
prop2 = x2.Item,
prop3 = x3.Item
FROM x
INNER JOIN x AS x2
ON x.InnerID = x2.InnerID - 1
AND x.ID = x2.ID
INNER JOIN x AS x3
ON x2.InnerID = x3.InnerID - 1
AND x2.ID = x3.ID
WHERE x.InnerID = 1
ORDER BY x.ID;
Results:
prop1 prop2 prop3
------ -------- -------
foo 20120101 26
bar 20120612 32

We use XML data types like this...
declare #contentXML xml
set #contentXML=convert(xml,N'<ROOT><V a="124694"/><V a="124699"/><V a="124701"/></ROOT>')
SELECT content_id,
FROM dbo.table c WITH (nolock)
JOIN #contentXML.nodes('/ROOT/V') AS R ( v ) ON c.content_id = R.v.value('#a', 'INT')
Here is what it would look like if calling a stored procedure...
DbCommand dbCommand = database.GetStoredProcCommand("MyStroredProcedure);
database.AddInParameter(dbCommand, "dataPubXML", DbType.Xml, dataPublicationXml);
CREATE PROC dbo.usp_get_object_content
(
#contentXML XML
)
AS
BEGIN
SET NOCOUNT ON
SELECT content_id,
FROM dbo.tblIVContent c WITH (nolock)
JOIN #contentXML.nodes('/ROOT/V') AS R ( v ) ON c.content_id = R.v.value('#a', 'INT')
END
SQL Server does not parse XML very quickly so the use of the SplitStrings function might be more performant. Just wanted to provide an alternative.

I can think of a few options, but as I was typing one of them (the Split option) was posted by Mr. #Bertrand above. The only problem with it is that SQL just isn't that good at string manipulation.
So, another option would be to use a #Temp table that your sproc assumes will be present. Build dynamic SQL to the following effect:
Start a transaction, CREATE TABLE #InsertData with the shape you need, then loop over the data you are going to insert, using INSERT INTO #InsertData SELECT <values> UNION ALL SELECT <values>....
There are some limitations to this approach, one of which is that as the data set becomes very large you may need to split the INSERTs into batches. (I don't recall the specific error I got when I learned this myself, but for very long lists of values I have had SQL complain.) The solution, though, is simple: just generate a series of INSERTs with a smaller number of rows each. For instance, you might do 10 INSERT SELECTs with 1000 UNION ALLs each instead of 1 INSERT SELECT with 10000 UNION ALLs. You can still pass the entire batch as a part of a single command.
The advantage of this (despite its various disadvantages-- the use of temporary tables, long command strings, etc) is that it offloads all the string processing to the much more efficient C# side of the equation and doesn't require an additional persistent database object (the Split function; though, again, who doesn't need one of these sometimes)?
If you DO go with a Split() function, I'd encourage you to offload this to a SQLCLR function, and NOT a T-SQL UDF (for the performance reasons illustrated by the link above).
Finally, whatever method you choose, note that you'll have more problems if your data can include strings that contain the delimiter (for instance, In Aaron's answer you run into problems if the data is:
'I pity the foo!', 20120101, 26; 'bar, I say, bar!', 20120612, 32
Again, because C# is better at string handling than T-SQL, you'll be better off without using a T-SQL UDF to handle this.
Edit
Please note the following additional point to think about for the dynamic INSERT option.
You need to decide whether any input here is potentially dangerous input and would need to be cleaned before use. You cannot easily parameterize this data, so this is a significant one. In the place I used this strategy, I already had strong guarantees about the type of the data (in particular, I have used it for seeding a table with a list of integer IDs to process, so I was iterating over integers and not arbitrary, untrusted strings). If you don't have similar assurances, be aware of the dangers of SQL injection.

How to loop through array in a stored procedure, return the array?

Part of a stored procedure I'm writing (on an Oracle DB) will return an array of integer values to a c# app. I've never done this before and I can't find info online on how to do this inside of the stored procedure.
On the C# side, I've connected to the DB and created a stored procedure command. I'm using:
cmd.Parameters.Add("returnID", OracleDbType.Array, ParameterDirection.Output);
To grab the array.
Inside of the Stored Procedure, I have:
CREATE OR REPLACE PROCEDURE ODM(/* not relevant*/, returnIDs OUT ARRAY)
IS
BEGIN
...
END ODM;
Where returnIDs is the array I want to output, full of integers.
I need to be able to loop through a table, ORDERS, and grab all integer primary keys between two values, and add them into returnIDs.
I'm hoping theres soemthing similar to an insert into the array, where the primary key is between the min and max value, but I'm not sure.
What's the syntax to be able to declare those values, loop through the table and add into my output array?
EDIT: solution: Bulk Collect would work for this, but it's much easier just to return the min and max values to my program and then just do a separate select in there.

I'm not sure that you need to loop. Depending on the definition of the ARRAY type, you can probably just
SELECT primary_key
BULK COLLECT INTO returnIDs
FROM orders
WHERE primary_key BETWEEN low_value AND high_value

So, what you need to know is:
we can insert into an array using Oracle's bulk collect syntax
we can give each row a unique number using the ROW_NUMBER() aggregate function.
Put them together into a PL/SQL function like this:
SQL> create or replace type numbers_nt as table of number
2 /
Type created.
SQL> create or replace function get_range_of_numbers
2 (p_start in pls_integer
3 , p_end in pls_integer )
4 return numbers_nt
5 is
6 rv numbers_nt ;
7 begin
8 select empno
9 bulk collect into rv
10 from
11 ( select empno
12 , row_number() over (order by empno asc) rn
13 from emp )
14 where rn between p_start and p_end;
15 return rv;
16 end;
17 /
Function created.
SQL>
Let's rock!
SQL> select *
2 from table(get_range_of_numbers(5, 8))
3 /
COLUMN_VALUE
------------
7654
7698
7782
7788
SQL>
Hmmm, I think I misread your question. You probably want to select records on the basis of key value rather than row position. In which case, the function should simply be
create or replace function get_range_of_numbers
(p_start in pls_integer
, p_end in pls_integer )
return numbers_nt
is
rv numbers_nt ;
begin
select empno
bulk collect into rv
where emp between p_start and p_end;
return rv;
end;
/

My db expertise is in sql server and firebird primarily and I am not a skilled oracle person. However, I was just curious, can't you just select the values and return them as a simple datatable or dataset to the C# application then you can either keep them in the datatable or convert them to an array or collection in the C# app however you like?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Select unique random rows from SQL Server table but always duplicates - c#

Related

Update all rows in sql table with unique random value without using primary key or unique key in c#

SQL Server : randomize rows (shuffle IDs)

How to insert Serial Number to Unpivoted Column

Passing multiple rows of data to a stored procedure

How to loop through array in a stored procedure, return the array?

Categories

Resources