SQL ORDER BY within OVER clause incompatible with CLR aggregation? - c#

Preamble
I have been investigating a concept and what I am posting below is a cut down version of what I have been trying. If you look at it and think "That doesn't make any sense to do it that way" then it is probably because I doesn't make any sense - there may be more efficient ways of doing this. I just wanted to try this out because it looks interesting.
What I am attempting to do it to calculate arbitrary calculations using CLR custom aggregations in SQL using a Reverse-Polish-like implementation. I'm using the data:
K | Amt | Instruction | Order
--+-----+-------------+------
A | 100 | Push | 1
A | 1 | Multiply | 2
A | 10 | Push | 3
A | 2 | Multiply | 4
A | | Add | 5
A | 1 | Push | 6
A | 3 | Multiply | 7
A | | Add | 8
The result of the calculation should be 123 ( = (100 * 1) + (10 * 2) + (1 * 3) ).
Using the following SQL (and the CLR functions ReversePolishAggregate and ToReversePolishArguments* that I have written) I can get the correct result:
SELECT K
, dbo.ReversePolishAggregate(dbo.ToReversePolishArguments(Instruction, Amt))
FROM dbo.ReversePolishCalculation
GROUP BY K;
The Problem
I want to generalise the solution more by putting the instructions and order in a separate table so that I can create calculations on arbitrary data. For example, I was thinking of a table like this:
Item | Type | Amount
-----+----------+-------
A | Budgeted | 10
A | Actual | 12
B | Actual | 20
B | Budgeted | 18
and joining it to a calculation table like this
Type | Instruction | Order
---------+-------------+------
Budgeted | Push | 1
Actual | Minus | 2
to calculated whether each item is over or under budget. The important consideration is that minus is non-commutative so I need to specify the order to ensure that the actual amount is subtracted from the budgeted amount, not the other way around. I expected that I would be able to do this with the ORDER BY clause inside the OVER clause of the aggregation (and then a little more tweaking that result).
SELECT K
, dbo.[ReversePolishAggregate](
dbo.[ToReversePolishArguments](Instruction, Amt))
OVER (PARTITION BY K ORDER by [Order])
FROM dbo.ReversePolishCalculation;
However I get the error:
Incorrect syntax near the keyword 'ORDER'.
I have checked the syntax by running the following SQL statement
SELECT K
, SUM(Amt) OVER (PARTITION BY K ORDER BY [Order])
FROM dbo.ReversePolishCalculation;
This works fine (it parses and runs, although I'm not sure that the result is meaningful), so I am left assuming that this is a problem with custom CLR aggregations or functions.
My Questions Is this supposed to work? Is there any documentation saying explicitly that this is not supported? Have I got the syntax right?
I'm using Visual Studio 2012 Premium, SQL Server Management Studio 2012 and .NET framework 4.0.
* I created 2 CLR functions as a work-around to not being able to pass multiple arguments into a single aggregation function - see this article.
EDIT: This post looks like it is not supported, but I was hoping for something a little more official.

As an answer:
Officially from Microsoft, No.
http://connect.microsoft.com/SQLServer/feedback/details/611026/add-support-for-over-order-by-for-clr-aggregate-functions
It doesn't appear to have made it into 2014 (or 2016 CTP 3) ... no mentions of many transact-sql changes:
http://msdn.microsoft.com/en-us/library/bb510411.aspx#TSQL

Related

Multiple incremental series with different prefix in SQL Server?

Here is the scenario:
Config Table:
+--------+-----------+-------+
| Prefix | Separator | Seed |
+--------+-----------+-------+
| A | # | 10000 |
+--------+-----------+-------+
Transaction Table:
+----+----------+------+
| Id | SerialNo | Col3 |
+----+----------+------+
| 1 | A#10000 | |
| 2 | A#10001 | |
+----+----------+------+
The Transaction table has a SerialNo column that has a sequential number generated based on configuration table. Configuration table determines the prefix separator and the seed value of the serial number.
In the above example the serial number would start at A#10000 and increment by 1.
But if after few months someone updates the configuration table to have
+--------+-----------+-------+
| Prefix | Separator | Seed |
+--------+-----------+-------+
| B | # | 10000 |
+--------+-----------+-------+
Then the Transaction table is supposed to look something like this:
+----+----------+------+
| Id | SerialNo | Col3 |
+----+----------+------+
| 1 | A#13000 | |
| 2 | B#10001 | |
+----+----------+------+
However there could be no duplicate serial numbers at any given point in time in Transaction table.
If someone sets Prefix back to A and seed to 10000 then the next serial number should not be A#10000 because it already exists. It should be A#13001
One could simply write a select query with MAX() and CONCAT() by then it could cause issues with concurrency. Don't want to have duplicate serial numbers. Also, would want to have this as performance friendly as possible.
Another solution that I could come up with is that I create a windows service that will keep on running and watching the table. The records get inserted with null as serial number and the windows service will update the serial number. This way there will be no concurrency issues but then I am not sure how reliable this is. There will be delays.
There will only be one entry in configuration table at any given point in time.
You can solve the seed value problem quite easily in SQL Server. When someone updates the seed value back to 10000 you will need to do this via a stored procedure. The stored procedure then determines what the actual next available value should be because clearly 10000 could be the wrong value. The stored procedure then executes DBCC CHECKIDENT with the correct "new_reseed_value". Then when new records are inserted the server will handle the values again correctly.
Please look at this link for usage on the DBCC CHECKIDENT command. SQL Server DBCC CHECKIDENT

Fill deleted numbers in consecutive series of numbers and LAST_INSERT_ID

First of all I'm an amateur and non-english native speaker, so I would appreciate it if you would have a little patience with me ;)
Trying to do two things here and I'm not sure if I should do two questions about it, but since it's all related in my case, I would like to say it all in one question.
I'm making a sort of accounting software, in theory for my personnal use. I'm using a DB generated auto_increment ID for almost all my objects, but for some specific cases I need a "parallel" more open ID that won't be primary key but could be manipulated by the user(yeah, I've read lots of questions about "you don't need a consecutive Primary Key", and i understand it and agree, but let me remark that this column won't be the primary key, lets call it just a "human-not-computer-expert friendly ID") matching these conditions:
The Id should auto increment when no parameters given.
When a number is given as a parameter that number should be used if not occupied, if occupied throw an exception.
The user should be asked if he/she wants to fill the missing IDs by DELETEs and whatever other operations, so if the user "say yes", the minimum missing ID should be automatically found and used.
I have no problem with doing this "by hand" in c#, but are there some way to achieve something like this in MySQL directly? I've read in the MySQL documentation that AUTO_INCREMENT does fulfill my first two conditions, but even if it fills missing deleted numbers by default, which I'm not sure of, I don't want it to do that by default, I need the software to ask first, or at least to do it based on a configuration pre established by the user.
Therefore I think I should do it by hand in c#(at least the last part, but i suspect i will be forced to do it entirely), which brings the question about LAST_INSERT_ID.
So, the MYSQL documentation says:
If the previous statement returned an error, the value of LAST_INSERT_ID() is undefined. For transactional tables, if the statement is rolled back due to an error, the value of LAST_INSERT_ID() is left undefined. For manual ROLLBACK, the value of LAST_INSERT_ID() is not restored to that before the transaction; it remains as it was at the point of the ROLLBACK.
I understand that LAST_INSERT_ID() is basically useless if the previous INSERT statement fails for whatever reason.
If that's the case, there's no way to retrieve the last inserted ID that ensures a known behaviour when something fails? Something like when INSERT fails returns 0 or a SQL exception? And if there's no other way what is the standard way of doing it(I suppose MAX(Id) won't do it), if something like a standard way exists... or should I just stop trying to do it at one go and do first the updates, check if all went ok, and then do a SELECT LAST_INSERT_ID?
To sum up:
Are there some way to achieve a column of consecutive numbers that fulfill the given conditions in MySQL directly?
What's with LAST_INSERT_ID? Should I give up and don't use it directly?
Situation 1, knowing an id that you want inserted into an AUTO_INCREMENT
Honoring that the AI is not a PK as described.
-- drop table a12b;
create table a12b
( id varchar(100) primary key,
ai_id int not null AUTO_INCREMENT,
thing varchar(100) not null,
key(ai_id)
);
insert a12b (id,thing) values ('a','fish'); -- ai_id=1
insert a12b (id,thing) values ('b','dog'); -- 2
insert a12b (id,thing) values ('b2','cat'); -- 3
delete from a12b where id='b';
insert a12b(id,ai_id,thing) values ('b',2,'dog with spots'); -- 2 ******** right here
insert a12b (id,thing) values ('z','goat'); -- 4
select * from a12b;
+----+-------+----------------+
| id | ai_id | thing |
+----+-------+----------------+
| a | 1 | fish |
| b | 2 | dog with spots |
| b2 | 3 | cat |
| z | 4 | goat |
+----+-------+----------------+
4 rows in set (0.00 sec)
Situation 2, having a system where you delete rows at some point. And want to fill those explicitly deleted gaps later: See my answer Here
Situation 3 (INNODB has a bunch of gaps sprinkled all over):
This was not part of the question. Perhaps use a left join utilizing a helper table (at least for ints not varchars. But then again we are talking about ints). If you need to spot a gap without knowing, shoot for a left join with a helper table (loaded up with numbers). I know it sounds lame, but helper tables are lean and mean and get the job done. The following would be a helper table: https://stackoverflow.com/a/33666394
INNODB Gap Anomaly
using the above table with 4 rows, continue with:
insert a12b (id,thing) values ('z','goat'); -- oops, problem, failed, but AI is incremented behind the scene
insert a12b (id,thing) values ('z2','goat'); -- 6 (you now have a gap)
data:
+----+-------+----------------+
| id | ai_id | thing |
+----+-------+----------------+
| a | 1 | fish |
| b | 2 | dog with spots |
| b2 | 3 | cat |
| z | 4 | goat |
| z2 | 6 | goat |
+----+-------+----------------+
There are a ton of ways to generate gaps. See This and That

Creating a Pivot table in C#

In my c# program I have someone input production for the whole day and I calculate machine usage (MU) like so:
Date | Part Number | Mold Num | Machine Num | MU
2/12/2016 | 1185-5B8 | 6580 | 12 | .428
2/12/2016 | 2249300 | 7797 | 36 | .271
2/12/2016 | 146865 | 5096789 | 12 | .260
2/16/2016 | 123456 | 7787 | 56 | .354
2/16/2016 | 123456 | 787 | 54 | .45
2/16/2016 | 123456 | 777 | 56 | .799
2/16/2016 | 123456 | 87 | 54 | .611
All of this data is in my SQL server and what I want to do is to make something like a pivot table and it takes all similar dates/Mold Numbers/Machine Numbers and makes an average the MU and display it in any way that the user wants. Example:
Date | MU
2/12/2016 | 32.0%
2/16/2016 | 55.4%
or
Machine Num. | MU
12 | 34.4%
36 | 27.1%
54 | 53.0%
56 | 57.6%
etc. Basically I want it to be variable and to show whatever the person that is looking at it needs. I want to keep it in my c# program but I can use LINQ to SQL. Please keep in mind that I am very new to c# and LINQ to SQL.
I did try to do this but it was not exactly what I wanted to do. I also could not figure out how I was going to display it on the windows form nor how to change what was in each column.
It seems you need to build pivot table with row dimension (Date, Machine Num), without column dimension and with average by MU.
You can easily do that with help of NReco PivotData library:
var pivotData = new PivotData(
new string[] {"Date","Machine Num"},
new AverageAggregatorFactory("MU"),
new DataTableReader(t) ); // just a sample - you can use DB data reader
var pivotTable = new PivotTable(
new []{"Date"}, // row dimension(s)
new [0], // column dimension(s)
pivotData );
// use pivotTable.RowKeys and indexer for accessing pivot table values
PivotData library can be used for free (I'm an author), and you can download examples package on the component's page. Also you may check advanced PivotData Toolkit components (for rendering pivot table to HTML/CSV/Excel/PDF, aggregating data on DB level with GROUP BY and many others), but they are not free.
According to your description, what you want isn't a pivot table, as already mentioned by #juharr.
For sqlite, this sould be accomplished with the following statement. (I assume the table name is tmp, and what you want is the sum of MU with each MachineNUm, naming it with SMU).
SELECT MachineNum,SUM(MU) AS SMU From tmp GROUP BY MachineNum ORDER BY MachineNum;
Hope it helps.

Linking two tables in dbContext where one key is not defined in the database

I've spent a bit of time looking at this and cannot find an answer online (Perhaps I am searching for the wrong thing..)
Below is a simplified version of the problem.
I have two tables:
Table 1 : Areas
AreaID | Group 1 | Group2 | Group3
----------------------------------------------
1 | 2 | 22 | 10
2 | 5 | 1 | 9
3 | 4 | 3 | 2
Table 2 : Groups
GroupID | Group | Code | Description
-------------------------------------------------
1 | 1 | 2 | Description 1
2 | 1 | 5 | Description 2
3 | 1 | 4 | Description 3
4 | 2 | 22 | Description 4
5 | 2 | 1 | Description 5
6 | 2 | 3 | Description 6
7 | 3 | 10 | Description 7
8 | 3 | 9 | Description 8
9 | 3 | 2 | Description 9
So the SQL to get the Group description for Area 1 Group 3 would be:
Select g.Description from Areas a
inner join Groups g on g.Code = a.Group3 and g.Group = 3
where a.AreaID = 1
To clarify the Areas table has one Foreign Key linking it to the Groups table, but to get a unique record from the groups table you also need to have the "Group" column.
This is fine using ADO.Net or Stored procs, but we really would like to use EF and be able to navigate between the entities properly.
I also need to point out that for the purposes of this project we ONLY need the Group3 from the areas table, we are not interested in any other groupings at the moment.
Where I am upto:
I have created classes to represent the tables in my project, I have added modelbindings to the context to define a relationship between the Area and the Group based on the Area.Group3 column mapping to the Group.Code column and this works (Its essentially a many to many relationship in EF at the moment) but it also brings out all of the rows for other groups where the Code column matches (Such as code 2 in the example above brings back GroupIDs 1 and 9)
What I would like to be able to do is constrain it in the context by saying something like
modelBuilder.Entity<Area>()
.hasRequired(a=>a.Group)
.WithMany(g=>g.Areas)
.map(m=>
{m.MapKey("Group3")});
but of course the above without a constraint on Group.Group = 3 brings back multiple groups for each Area which in turn breaks because the binding is telling it to expect one!
Thats a bit of a ramble, if anyone needs clarification on the above to be able to help let me know and Ill get the info to you!
Thanks

How do I address an issue with an overly wide index?

I am not very proficient at SQL yet. I'm learning, but it's a slow process. I am working on a project at work which stores a good deal of information in a database in SQL Server. In one of the tables, ContactInformation, we're experiencing an error when an attempt to modify an entry runs afoul because a nonclustered index composed of all of the address information exceeds 900 bytes. I've used sys.dm_db_index_usage_stats to verify that modifying an entry in the table leads to 3 user_seeks and 1 user_update.
The C# code does not seem to be directly calling the index. It executes a single DbCommand that consists of a stored procedure command of the Update variety with 19 parameters. My thoughts are to either eliminate the index or to try to break up the DbCommand into multiple updates with a smaller number of parameters in hopes of having a smaller index to work with.
I am a bit at sea due to my lack of experience. I welcome any advice on which way to turn next.
The Index consists of the following:
| Name | Data Type | Size |
|----------------------|---------------|------|
| ContactInformationID | int | 4 |
| CompanyID | smallint | 2 |
| Address1 | nvarchar(420) | 840 |
| Address2 | nvarchar(420) | 840 |
| City | nvarchar(420) | 840 |
| State | nvarchar(220) | 440 |
| PostalCode | nvarchar(120) | 240 |
| Country | nvarchar(220) | 440 |
Yes, most of the columns are oversized. We apparently inherited this database from a different project. Our software limits most of the columns to no more than 100 characters, although there are some outliers.
The index size limit only applies to the key columns. It applies to all B-Tree bases storage modes (NCI and CI). This limit exists to ensure a certain degree on tree fanout in order to bound the tree height.
If you don't need to seek on columns such as Address1 and Address2 (considering that they might be null as well) make those columns included columns.
The index key should never be longer than the shortest key prefix that results in a unique index. Every column after that never helps compared to that column being included.
If ContactInformationID is unique, which I have a feeling it very well could be, then having any other fields in the index is pointless.
Such an index is useful only for queries where the value of ContactInformationID is present as a query parameter, and when it is, the rest of the fields are immaterial.

Categories

Resources