Reversing cross join input - c#

A table is populated by the following stored procedure:
exec('
insert into tblSegments
(SegmentName, CarTypeID, EngineTypeID, AxleTypeID)
select distinct
''' + #SegmentName + '''
, CT.CarTypeID
, ET.EngineTypeID
, AT.AxleTypeID
from
tblCarTypes CT
cross join tblEngineTypes ET
cross join tblAxleTypes AT
where
CT.CarTypeName in (' + #CarTypes + ')
and ET.EngineTypeName in (' + #EngineTypes + ')
and AT.AxleTypeName in (' + #AxleTypes + ')
')
parameters, with the exception of #SegmentName, are strings such as (for #CarTypes) 'hatchback','suv','sedan'.
Can the data in the table be used to create a list, for a single SegmentName, of the previous entries to the stored procedure akin to
Run1: #CarTypes, #EngineTypes, #AxleTypes
Run2: #CarTypes, #EngineTypes, #AxleTypes
Run3: #CarTypes, #EngineTypes, #AxleTypes
...?
Runs don't need to be in sequential order. The process can involve a combination of T-SQL and C#. I'm pretty sure this is impossible; perhaps someone can prove me wrong.

No, it's not possible because you're taking in a potentially comma-delimited string of values which will create separate rows in your result table. You can easily get a single value each for the CarTypes, EngineTypes and AxleTypes variables, but to group them separately by each execution of your dynamic SQL you would need some kind of executionID column or something to group the rows on per execution.
So you're correct in that what you want to do is completely possible, but not with the schema design you've provided. I would just create another table and populate it at runtime if this is information you want to keep. You could put an identify column on the table that houses the input variables and use the ##IDENTITY for the insert into that table to populate an executionID column in your main table so you can easily associate the variable summary table with the cross joined result table.

Related

How to generate an integer unique id in a multi-user application without duplication in C#/SqlServer?

Scenario
I want to generate a unique integer key (not a database primary key) by extracting max() and incrementing it by one. Then I want to use that integer key for insert/update operations in one or more tables. I prefer to do this in C# code but if there's no other option then I could go for SQL batch statements or inside a stored procedure etc as well.
Question
Since multiple users can be doing this at the same time, how do I ensure that no two users get the same max()?
Sample pseudocode
Let's say there are two tables - Employee (Columns: EmpId, BatchId, Name) and MiscData (Columns: EmpId, BatchId).
Below is the C# inspired pseudocode that shows one implementation of this scenario -
void DoOperation(int empId, string[] names)
{
int maxBatchId = repository.GetMaxBatchId(); //GetMaxBatchId() basically executes select max(BatchId) from Employee
maxBatchId++;
foreach(string name in names)
command.ExecuteNonQuery("insert into Employee values (" + maxBatchId + ", '" + name + "')");
command.ExecuteNonQuery("insert into MiscData select EmpId, " + maxBatchId + " where EmpId = " + empId);
}
The method DoOperation above can be doing whatever database operations based on the value of maxBatchId + 1.
If more than one users run DoOperation(...) at the same time, they're likely to get exactly same maxBatchId. How do I ensure that only a single instance of this method can run at one time?
You can use Sequence in SQL Server
CREATE SEQUENCE testseq
START WITH 1
INCREMENT BY 1 ;
SELECT NEXT VALUE FOR testseq
I cannot use sequence in SQL server because I cannot have gaps in the numbers due to regulatory rules.
This is an old post but while I was reading this to find an answer I thought of a different way that might actually work and posting it here to help others.
If you create a new column in the table and set the identity insert to true and 1,1 when you save the record it will be unique.
BUT you may not be able to modify the database table as it may affect other areas...
So what you can do is create a new table with three fields: ID, Username, CurrentDateTime.
The ID field is incrementing(1,1)
When you want a unique ID simply add a record to that table with the current date and time and the username which will generate you a unique ID and you can
Select Top 1 ID where UserName = XXX order by DateTime Desc
That will give you that users last entry in this table which you can use as your unique integer key
Pros:
Unique identifier that will never duplicate in a multi-user
environment
If you have records already created you can insert them
into this table after the fact and start using this method.
Cons: An extra table
The extra table will take almost no space in your database and any old systems won't be affected by this new table. New systems might complain that the table should not exist and of course this needs to be tested.

How to Join single table to it self in SQL Server 2008?

I am using below table structure, I want to create a view which will show a FirstName of ReportsTo field shown below.
Please let me give a suggestion how to create that view which will display all the reports to 's first name with (',') comma separator.
You join a table to itself just like any other join. The main this is to make sure both tables are aliased with differnt aliases
Your problem is that you have a one to many relationship stored in the table which is a huge design mistake. For the future, remember that anytime you think about storing information a comma delimted list, then you are doing it wrong and need a related table instead. So first you have to split the data out into the related table you should have had instead with two columns, EmplCode and ReportsTo (with only one value in reports to), then you can do the join just like any other join. We use a function that you can get by searching around the internet called fn_split to split out such tables when we get this type of infomation in client files.
If you search out fn_split, then this is how you can apply it:
Create table #UnsplitData (EmpCode varchar (10), ReportsTo varchar(20), FirstName varchar (10))
insert into #UnsplitData
values ('emp_0101', 'emp_0102,emp_0103', 'John')
, ('emp_0102', 'emp_0103', 'Sally')
, ('emp_0103', Null, 'Steve')
select *, employee.FirstName + ', ' + Reports.FirstName
from #UnsplitData Employee
join
(
select t.EmpCode , split.value as Reportsto, ReportName.Firstname
from #UnsplitData t
cross apply dbo.fn_Split( ReportsTo, ',') split
join #UnsplitData ReportName
on ReportName.EmpCode = split.value
) Reports
On Employee.EmpCode = Reports.empcode
From what I gather, I think you're trying to get the Firstname column and the ReportsTo column separated by a comma:
SELECT FirstName + ', ' + ReportsTo
FROM table
Edit: judging from the comments he's trying to do something else? Can someone rephrase for me?
SELECT E.*,
R.FirstName
FROM Employees E
JOIN Employees R
ON E.ReportsTo LIKE '%' + R.EmpCode + '%'

Prefix every column name with a specific string?

I'm trying to manually map some rows to instances of their appropriate classes. I know that I need to use every column of every table, and map all of those columns from one table into a given class.
However, I was wondering if there would be an easier way to do it. Right now, I have a class called School and a class called User. Each of these classes has a Name property, and other properties (but the ´Name` one is the important one, since it is a mutual name for both classes).
Right now, I am doing the following to map them down.
SELECT u.SomeOtherColumn, u.Name AS userName, s.SomeOtherColumn, s.Name AS schoolName FROM User AS u INNER JOIN School AS s ON something
I would love to do the following, but I can't, since Name is a mutual name between the classes.
SELECT u.*, s.* FROM User AS u INNER JOIN School AS s ON something
This however generates an error since they both have the column Name. Can I prefix them somehow? Like this for instance?
u.user_*, s.school_*
So that every column of each of those tables have a prefix? For instance user_Name and school_Name?
Years ago I wrote a bunch of functions and procedures to help me with developing automatic code-generation routines for SQL Servers and applications using dynamic SQL. Here is the one that I think would be most helpful to your situation:
Create FUNCTION [dbo].[ColumnString2]
(
#TableName As SYSNAME, --table or view whose column names you want
#Template As NVarchar(MAX), --replaces '{c}' with the name for every column,
#Between As NVarchar(MAX) --puts this string between every column string
)
RETURNS NVarchar(MAX) AS
BEGIN
DECLARE #str As NVarchar(MAX);
SELECT TOP 999
#str = COALESCE(
#str + #Between + REPLACE(#Template,N'{c}',COLUMN_NAME),
REPLACE(#Template,N'{c}',COLUMN_NAME)
)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA= COALESCE(PARSENAME(#TableName, 2), N'dbo')
And TABLE_NAME = PARSENAME(#TableName, 1)
ORDER BY ORDINAL_POSITION
RETURN #str;
END
This allows you to format all of the column names of a table or view any way that you want. Simply pass it a table name, and a Template string with '{c}' everywhere that you want the column name inserted for each column. It will do this for every column in #TableName, and add the #Between string in between them.
Here is an example of how to vertically format all of the column names for a table, renaming them with a prefix in a way that is suitable for inclusion into a SELECT query:
SELECT dbo.[ColumnString2](N'yourTable', N'
{c} As prefix_{c}', N',')
This function was intended for use with dynamic SQL, but you can use it too by executing it in Management Studio with your output set to Text (instead of Grid). Then cut and paste the output into your desired query, view or code text. (Be sure to change your SSMS Query options for Text Results to raise the "maximum number of characters displayed" from 256 to the max (8000). If that still gets cut off for you, then you can change this procedure to a function that outputs each column as a separate row, instead of as one single large string.)

Passing multiple rows of data to a stored procedure

I have a list of objects (created from several text files) in C#.net that I need to store in a SQL2005 database file. Unfortunately, Table-Valued Parameters began with SQL2008 so they won't help. I found from MSDN that one method is to "Bundle multiple data values into delimited strings or XML documents and then pass those text values to a procedure or statement" but I am rather new to stored procedures and need more help than that. I know I could create a stored procedure to create one record then loop through my list and add them, but that's what I'm trying to avoid. Thanks.
Input file example (Other files contain pricing and availability):
Matnr ShortDescription LongDescription ManufPartNo Manufacturer ManufacturerGlobalDescr GTIN ProdFamilyID ProdFamily ProdClassID ProdClass ProdSubClassID ProdSubClass ArticleCreationDate CNETavailable CNETid ListPrice Weight Length Width Heigth NoReturn MayRequireAuthorization EndUserInformation FreightPolicyException
10000000 A&D ENGINEERING SMALL ADULT CUFF FOR UA-767PBT UA-279 A&D ENGINEERING A&D ENG 093764011542 GENERAL General TDINTERNL TD Internal TDINTERNL TD Internal 2012-05-13 12:18:43 N 18.000 .350 N N N N
10000001 A&D ENGINEERING MEDIUM ADULT CUFF FOR UA-767PBT UA-280 A&D ENGINEERING A&D ENG 093764046070 GENERAL General TDINTERNL TD Internal TDINTERNL TD Internal 2012-05-13 12:18:43 N 18.000 .450 N N N N
Some DataBase File fields:
EffectiveDate varchar(50)
MfgName varchar(500)
MfgPartNbr varchar(500)
Cost varchar(200)
QtyOnHand varchar(200)
You can split multiple values from a single string quite easily. Say you can bundle the string like this, using a comma to separate "columns", and a semi-colon to separate "rows":
foo, 20120101, 26; bar, 20120612, 32
(This assumes that colons and semi-colons can't appear naturally in the data; if they can, you'll need to choose other delimiters.)
You can build a split routine like this, which includes an output column that allows you to determine the order the value appeared in the original string:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
Then you can query it like this (for simplicity and illustration I'm only handling 3 properties but you can extrapolate this for 11 or n):
DECLARE #x NVARCHAR(MAX); -- a parameter to your stored procedure
SET #x = N'foo, 20120101, 26; bar, 20120612, 32';
;WITH x AS
(
SELECT ID = s.Number, InnerID = y.Number, y.Item
-- parameter and "row" delimiter here:
FROM dbo.SplitStrings(#x, ';') AS s
-- output and "column" delimiter here:
CROSS APPLY dbo.SplitStrings(s.Item, ',') AS y
)
SELECT
prop1 = x.Item,
prop2 = x2.Item,
prop3 = x3.Item
FROM x
INNER JOIN x AS x2
ON x.InnerID = x2.InnerID - 1
AND x.ID = x2.ID
INNER JOIN x AS x3
ON x2.InnerID = x3.InnerID - 1
AND x2.ID = x3.ID
WHERE x.InnerID = 1
ORDER BY x.ID;
Results:
prop1 prop2 prop3
------ -------- -------
foo 20120101 26
bar 20120612 32
We use XML data types like this...
declare #contentXML xml
set #contentXML=convert(xml,N'<ROOT><V a="124694"/><V a="124699"/><V a="124701"/></ROOT>')
SELECT content_id,
FROM dbo.table c WITH (nolock)
JOIN #contentXML.nodes('/ROOT/V') AS R ( v ) ON c.content_id = R.v.value('#a', 'INT')
Here is what it would look like if calling a stored procedure...
DbCommand dbCommand = database.GetStoredProcCommand("MyStroredProcedure);
database.AddInParameter(dbCommand, "dataPubXML", DbType.Xml, dataPublicationXml);
CREATE PROC dbo.usp_get_object_content
(
#contentXML XML
)
AS
BEGIN
SET NOCOUNT ON
SELECT content_id,
FROM dbo.tblIVContent c WITH (nolock)
JOIN #contentXML.nodes('/ROOT/V') AS R ( v ) ON c.content_id = R.v.value('#a', 'INT')
END
SQL Server does not parse XML very quickly so the use of the SplitStrings function might be more performant. Just wanted to provide an alternative.
I can think of a few options, but as I was typing one of them (the Split option) was posted by Mr. #Bertrand above. The only problem with it is that SQL just isn't that good at string manipulation.
So, another option would be to use a #Temp table that your sproc assumes will be present. Build dynamic SQL to the following effect:
Start a transaction, CREATE TABLE #InsertData with the shape you need, then loop over the data you are going to insert, using INSERT INTO #InsertData SELECT <values> UNION ALL SELECT <values>....
There are some limitations to this approach, one of which is that as the data set becomes very large you may need to split the INSERTs into batches. (I don't recall the specific error I got when I learned this myself, but for very long lists of values I have had SQL complain.) The solution, though, is simple: just generate a series of INSERTs with a smaller number of rows each. For instance, you might do 10 INSERT SELECTs with 1000 UNION ALLs each instead of 1 INSERT SELECT with 10000 UNION ALLs. You can still pass the entire batch as a part of a single command.
The advantage of this (despite its various disadvantages-- the use of temporary tables, long command strings, etc) is that it offloads all the string processing to the much more efficient C# side of the equation and doesn't require an additional persistent database object (the Split function; though, again, who doesn't need one of these sometimes)?
If you DO go with a Split() function, I'd encourage you to offload this to a SQLCLR function, and NOT a T-SQL UDF (for the performance reasons illustrated by the link above).
Finally, whatever method you choose, note that you'll have more problems if your data can include strings that contain the delimiter (for instance, In Aaron's answer you run into problems if the data is:
'I pity the foo!', 20120101, 26; 'bar, I say, bar!', 20120612, 32
Again, because C# is better at string handling than T-SQL, you'll be better off without using a T-SQL UDF to handle this.
Edit
Please note the following additional point to think about for the dynamic INSERT option.
You need to decide whether any input here is potentially dangerous input and would need to be cleaned before use. You cannot easily parameterize this data, so this is a significant one. In the place I used this strategy, I already had strong guarantees about the type of the data (in particular, I have used it for seeding a table with a list of integer IDs to process, so I was iterating over integers and not arbitrary, untrusted strings). If you don't have similar assurances, be aware of the dangers of SQL injection.

how to improve SQL query performance in my case

I have a table, schema is very simple, an ID column as unique primary key (uniqueidentifier type) and some other nvarchar columns. My current goal is, for 5000 inputs, I need to calculate what ones are already contained in the table and what are not. Tht inputs are string and I have a C# function which converts string into uniqueidentifier (GUID). My logic is, if there is an existing ID, then I treat the string as already contained in the table.
My question is, if I need to find out what ones from the 5000 input strings are already contained in DB, and what are not, what is the most efficient way?
BTW: My current implementation is, convert string to GUID using C# code, then invoke/implement a store procedure which query whether an ID exists in database and returns back to C# code.
My working environment: VSTS 2008 + SQL Server 2008 + C# 3.5.
My first instinct would be to pump your 5000 inputs into a single-column temporary table X, possibly index it, and then use:
SELECT X.thecol
FROM X
JOIN ExistingTable USING (thecol)
to get the ones that are present, and (if both sets are needed)
SELECT X.thecol
FROM X
LEFT JOIN ExistingTable USING (thecol)
WHERE ExistingTable.thecol IS NULL
to get the ones that are absent. Worth benchmarking, at least.
Edit: as requested, here are some good docs & tutorials on temp tables in SQL Server. Bill Graziano has a simple intro covering temp tables, table variables, and global temp tables. Randy Dyess and SQL Master discuss performance issue for and against them (but remember that if you're getting performance problems you do want to benchmark alternatives, not just go on theoretical considerations!-).
MSDN has articles on tempdb (where temp tables are kept) and optimizing its performance.
Step 1. Make sure you have a problem to solve. Five thousand inserts isn't a lot to insert one at a time in a lot of contexts.
Are you certain that the simplest way possible isn't sufficient? What performance issues have you measured so far?
What do you need to do with those entries that do or don't exist in your table??
Depending on what you need, maybe the new MERGE statement in SQL Server 2008 could fit your bill - update what's already there, insert new stuff, all wrapped neatly into a single SQL statement. Check it out!
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://www.sql-server-performance.com/articles/dba/SQL_Server_2008_MERGE_Statement_p1.aspx
http://blogs.msdn.com/brunoterkaly/archive/2008/11/12/sql-server-2008-merge-capability.aspx
Your statement would look something like this:
MERGE INTO
(your target table) AS t
USING
(your source table, e.g. a temporary table) AS s
ON t.ID = s.ID
WHEN NOT MATCHED THEN -- new rows does not exist in base table
....(do whatever you need to do)
WHEN MATCHED THEN -- row exists in base table
... (do whatever else you need to do)
;
To make this really fast, I would load the "new" records from e.g. a TXT or CSV file into a temporary table in SQL server using BULK INSERT:
BULK INSERT YourTemporaryTable
FROM 'c:\temp\yourimportfile.csv'
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR =' |\n'
)
BULK INSERT combined with MERGE should give you the best performance you can get on this planet :-)
Marc
PS: here's a note from TechNet on MERGE performance and why it's faster than individual statements:
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables are evaluated and processed multiple times; at least once for each statement.
By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. This topic provides best practice recommendations to help you achieve optimal performance when using the MERGE statement.
Try to ensure you end up running only one query - i.e. if your solution consists of running 5000 queries against the database, that'll probably be the biggest consumer of resources for the operation.
If you can insert the 5000 IDs into a temporary table, you could then write a single query to find the ones that don't exist in the database.
If you want simplicity, since 5000 records is not very many, then from C# just use a loop to generate an insert statement for each of the strings you want to add to the table. Wrap the insert in a TRY CATCH block. Send em all up to the server in one shot like this:
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
if you have a unique index or primary key defined on your string GUID, then the duplicate inserts will fail. Checking ahead of time to see if the record does not exist just duplicates work that SQL is going to do anyway.
If performance is really important, then consider downloading the 5000 GUIDS to your local station and doing all the analysis localy. Reading 5000 GUIDS should take much less than 1 second. This is simpler than bulk importing to a temp table (which is the only way you will get performance from a temp table) and doing an update using a join to the temp table.
Since you are using Sql server 2008, you could use Table-valued parameters. It's a way to provide a table as a parameter to a stored procedure.
Using ADO.NET you could easily pre-populate a DataTable and pass it as a SqlParameter.
Steps you need to perform:
Create a custom Sql Type
CREATE TYPE MyType AS TABLE
(
UniqueId INT NOT NULL,
Column NVARCHAR(255) NOT NULL
)
Create a stored procedure which accepts the Type
CREATE PROCEDURE spInsertMyType
#Data MyType READONLY
AS
xxxx
Call using C#
SqlCommand insertCommand = new SqlCommand(
"spInsertMyType", connection);
insertCommand.CommandType = CommandType.StoredProcedure;
SqlParameter tvpParam =
insertCommand.Parameters.AddWithValue(
"#Data", dataReader);
tvpParam.SqlDbType = SqlDbType.Structured;
Links: Table-valued Parameters in Sql 2008
Definitely do not do it one-by-one.
My preferred solution is to create a stored procedure with one parameter that can take and XML in the following format:
<ROOT>
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000">
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000001">
....
</ROOT>
Then in the procedure with the argument of type NCHAR(MAX) you convert it to XML, after what you use it as a table with single column (lets call it #FilterTable). The store procedure looks like:
CREATE PROCEDURE dbo.sp_MultipleParams(#FilterXML NVARCHAR(MAX))
AS BEGIN
SET NOCOUNT ON
DECLARE #x XML
SELECT #x = CONVERT(XML, #FilterXML)
-- temporary table (must have it, because cannot join on XML statement)
DECLARE #FilterTable TABLE (
"ID" UNIQUEIDENTIFIER
)
-- insert into temporary table
-- #important: XML iS CaSe-SenSiTiv
INSERT #FilterTable
SELECT x.value('#ID', 'UNIQUEIDENTIFIER')
FROM #x.nodes('/ROOT/MyObject') AS R(x)
SELECT o.ID,
SIGN(SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END)) AS FoundInDB
FROM #FilterTable o
LEFT JOIN dbo.MyTable t
ON o.ID = t.ID
GROUP BY o.ID
END
GO
You run it as:
EXEC sp_MultipleParams '<ROOT><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000"/><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000002"/></ROOT>'
And your results look like:
ID FoundInDB
------------------------------------ -----------
60EAD98F-8A6C-4C22-AF75-000000000000 1
60EAD98F-8A6C-4C22-AF75-000000000002 0

Categories

Resources