Calculating change in column over groups and extracting based on criteria - c#

I am a beginner to coding in U-SQL/C#. I am stuck in a place during windowing/aggregation.
My Data looks like
Name Date OrderNo Type Balance
one 2018-06-25T04:55:44.0020987Z 1 Drink 15
one 2018-06-25T04:57:44.0020987Z 1 Drink 70
one 2018-06-25T04:59:44.0020987Z 1 Drink 33
one 2018-06-25T04:59:49.0020987Z 1 Drink 25
two 2018-06-25T04:55:44.0020987Z 2 Drink 22
two 2018-06-25T04:57:44.0020987Z 2 Drink 81
two 2018-06-25T04:58:44.0020987Z 2 Drink 33
two 2018-06-25T04:59:44.0020987Z 2 Drink 45
In U-SQL I am adding a unique id based on combinations of name, orderno and type and for the purpose of sorting, I am adding another one including the date.
#files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
#InputFile
USING new JsonExtractor();
#files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id
FROM #files;
My Data now looks like this:
Name Date OrderNo Type Balance group_id id
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2
(I have added only 4 records per group but there are multiple per group)
I am stuck at determining the difference between successive rows in the balance column in each group.
Expected Output for Part 1:
Name Date OrderNo Type Balance group_id id increase
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1 0
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1 55
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1 -37
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1 -8
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2 0
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2 59
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2 -48
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2 8
For every new group (defined by id) the increase should start from zero.
I went through stack overflow and saw the lag function from transgresql. I could not find a C# equivalent. Is that applicable in this case?
Any help is appreciated. Further clarification will be provided if required.
Update: When I use CASE WHEN my solution looks like this
CURRENT OUTPUT DESIRED OUTPUT
id Balance Increase id Balance Increase
1 15 0 1 15 0
1 70 55 1 70 55
1 33 -37 1 33 -37
1 25 -8 1 25 -8
2 22 "-3" 2 22 "0"
2 81 59 2 81 59
2 33 -48 2 33 -48
2 45 12 2 45 12
Look at the highlighted row. The increase column must start at 0 for each id.
Update: I was able to solve the first part of my question. See my answer below.
The second part that I had posted earlier was incorrectly posted. I have removed that.

You can try to use LAG window function get previous Balance in a subquery, then use where write the condition.
SELECT * FROM (
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id,
(CASE WHEN LAG(Balance) OVER(ORDER BY name,type,orderno) IS NULL THEN 0
ELSE Balance - LAG(Balance) OVER(ORDER BY name,type,orderno)
END) as increase
FROM #files
) t1
WHERE increase > 50

The query that finally worked for me was this..
#files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
#InputFile
USING new JsonExtractor();
#files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS group_id
FROM #files;
#files3 =
SELECT *,
DENSE_RANK() OVER(PARTITION BY group_id ORDER BY date) AS group_order
FROM #files2;
#files4 =
SELECT *,
(CASE WHEN group_order == 1 THEN 0
ELSE balance - LAG(balance) OVER(ORDER BY name,type,orderno)
END) AS increase
FROM #files3;

Related

Query to select all rows with a condition

I need to make a query that select all rows belong to a ID from the datatable where Lesson from that ID has two specific values. For example ID 2 and 3 have lessons D and E that I want to take all the rows of ID 2 and ID 3.
My data is in MS Access.
MY Programming language is C#.
I already try to write some a query
"SELECT * FROM MainData WHERE ID IN ( SELECT ID FROM MainData GROUP BY ID HAVING (Lesson ='" + txtbox_startsubject.Text + "' and '" + TargetPoint + "'))";
The format of data:
ID Lesson Time Score
1 C 165 4
1 E 190 3
1 H 195 3
1 I 200 4
2 A 100 2
2 B 150 5
2 D 210 2
2 R 10 3
2 E 110 4
3 D 130 5
3 E 190 5
3 H 210 4
3 I 160 4
3 J 110 4
4 E 120 3
4 H 150 4
4 J 170 4

How to sum the column which is have same Items

I want to sum TU which is having same items...My table is
TID ITEMS TIMES TU
1 D 5 136
1 M 5 136
1 R 14 136
4 D 2 106
4 B 6 106
4 H 1 106
5 D 1 97
5 B 6 97
5 M 6 97
7 D 4 77
7 B 2 77
How to check the similar items in the table
I would suggest something like this in SQL:
SELECT items, SUM(tu)
FROM table
GROUP BY items

How to do multiple select sum sub-queries

This is my first post so if I do anything incorrectly concerning the post please correct me.
I am creating a scoring program for a fishing competition.
I have the following tables(I am only going to list the columns of interest:
tblScores:
|Column Name|
Pk_CatchID
Fk_AnglerID
Day
Fk_FishID
Points
tblAnglers:
|Column Name|
Pk_AnglerID
Fk_BoatID
Name
tblBoats:
|Column Name|
Pk_BoatID
BoatName
Now what I want to do is create a score sheet for a whole week of competition, which is 5 days. So I have to do a sum of the scores and use the respective foreign keys to sum the scores for each boat.
This is what I have currently:
Select BoatName, " +
"Sum(tblScores.Points) AS [Day 1] " +
"from tblScores INNER JOIN tblAnglers ON tblScores.Fk_AnglerID=tblAnglers.Pk_AnglerID " +
" INNER JOIN tblBoats ON tblAnglers.Fk_BoatID=tblBoats.Pk_BoatID "
+ " where Day=1 GROUP BY BoatName
This works perfectly fine for one day, but what I would like to do is view this data in a DataGridView with columns for each day and then a total column as well.
Something like this:
|Boat Name|Day 1|Day 2|Day 3|Day 4|Day 5|Total|
|Example1 | 50 | 30 | 65 | 35 | 40 | 220 |
|Example2 | 40 | 50 | 70 | 35 | 30 | 225 |
I have tried using nested selects but I could not get this to work. I am open to suggestions on how this can be solved.
Also my other thought was to create a new table and keep these scores for each day in there(or even in the boats table) but I feel that the structure of the database would not be as good as data would be repeated. But I could be wrong.
Thank you all!
Also: I am using Visual Studio 2013 (C#) and Microsoft SQL Server 2010.
Try this:
Select
BoatName,
Sum(Case When Day = 1 Then tblScores.Points Else 0 End) AS [Day1],
Sum(Case When Day = 2 Then tblScores.Points Else 0 End) AS [Day2],
Sum(Case When Day = 3 Then tblScores.Points Else 0 End) AS [Day3],
Sum(Case When Day = 4 Then tblScores.Points Else 0 End) AS [Day4],
Sum(Case When Day = 5 Then tblScores.Points Else 0 End) AS [Day5],
Sum(tblScores.Points) As Total
From tblScores
INNER JOIN tblAnglers ON tblScores.Fk_AnglerID=tblAnglers.Pk_AnglerID
INNER JOIN tblBoats ON tblAnglers.Fk_BoatID=tblBoats.Pk_BoatID
GROUP BY BoatName
Order By BoatName
Fiddle

convert rows to columns in Access

I have read many question on Stack Overflow related to my problem, but I don't think they quite address my problem. Basically I download a XML dataset with lots of data, and inserted that data into my MS Access database. What I want to do is convert the data so that some specific rows become columns.
Now I can probably do this manually in code before inserting the data to database, but that would require lots of time and change in code, so I'm wondering if its possible to do this with MS Access.
Here's how my table basically looks, and how I want to convert it.
The index is not so relevant in my case
[Table1] => [Table1_converted]
[Index] [Name] [Data] [NameID] [NameID] [AA] [BB] [CC] [DD]
1 AA 14 1 1 14 date1 64 61
2 BB(date) 42 1 2 15+19 date2 67+21 63+12
3 CC 64 1 3 9 10
4 DD 61 1 4 date4 1 87
5 AA 15 2
6 BB(date) 35 2
7 CC 67 2
8 DD 63 2
9 AA 9 3
10 CC 10 3
11 AA 19 2
12 BB(date) 20 2
13 CC 21 2
14 DD 12 2
15 BB(date) 83 4
16 CC 1 4
17 DD 87 4
Forgot to mention that, the Values under the column [Name] are not really AA BB CC.
They are more complex then that. AA is actually like "01 - NameAA", without the quotation mark.
Forgot to mention one important element in my question, if the [Name] ex. AA with same [NameID] exists in table, then the [Data] should SUM up those two values. I have edited the tables, on the converted table i have written ex. 15+19 or 35+20 which only illustrates which values are summed up.
One more edit, hopefully the last. One of the [Name] BB has a Datetime type in [Data].
The NameID can be whichever, does not matter. So i need a query which does an exception on [Name] BB when its summing up, so that it does not sum it up like it does to every other [Name]s [Data]. Places where date is written multiple times for same [Name] and [NameID], it is always the same.
To accomplish this in Access, all you need to do is
TRANSFORM Sum([Data]) AS SumOfData
SELECT [NameID]
FROM [Table1]
GROUP BY [NameID]
PIVOT [Name]
edit re: revised question
To handle some [Name]s differently we would need to assemble the results (Sum()s, etc.) first, and then crosstab the results
For test data in [Table1]:
Index Name Data NameID
----- ---- ---------- ------
1 AA 14 1
2 BB 2013-12-01 1
3 CC 64 1
4 DD 61 1
5 AA 15 2
6 BB 2013-12-02 2
7 CC 67 2
8 DD 63 2
9 AA 9 3
10 CC 10 3
11 AA 19 2
12 BB 2013-12-02 2
13 CC 21 2
14 DD 12 2
15 BB 2013-12-04 4
16 CC 1 4
17 DD 87 4
the query
TRANSFORM First(columnData) AS whatever
SELECT [NameID]
FROM
(
SELECT [NameID], [Name], Sum([Data]) AS columnData
FROM [Table1]
WHERE [Name] <> 'BB'
GROUP BY [NameID], [Name]
UNION ALL
SELECT DISTINCT [NameID], [Name], [Data]
FROM [Table1]
WHERE [Name] = 'BB'
)
GROUP BY [NameID]
PIVOT [Name]
produces
NameID AA BB CC DD
------ -- ---------- -- --
1 14 2013-12-01 64 61
2 34 2013-12-02 88 75
3 9 10
4 2013-12-04 1 87
Try this...in sql query may be it is your answer
SELECT NameID , [AA] as AA,[BB] as BB,[CC] as CC,[DD] as DD
FROM
(
SELECT Name,Data,NameID FROM Table1
)PivotData
PIVOT
(
max(Data) for Name in ([AA],[BB],[CC],[DD])
) AS Pivoting
I think you need to this
1) Take all your Table1 as it is in SQL Server
2) Then run following query
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(Name)
from [Table1]
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT countryid,' + #cols + '
from
(
select NameID, Name
from Table1 cc
) T
pivot
(
max (Name)
for languagename in (' + #cols + ')
) p '
execute sp_executesql #query;
DECLARE #Table1 TABLE ([Index] INT,[Name] CHAR(2),[Data] INT,[NameID] INT)
INSERT INTO #Table1
VALUES
(1,'AA',14,1),
(2,'BB',42,1),
(3,'CC',64,1),
(4,'DD',61,1),
(5,'AA',15,2),
(6,'BB',35,2),
(7,'CC',67,2),
(8,'DD',63,2),
(9,'AA',9,3),
(10,'CC',10,3),
(11,'BB',83,4),
(12,'CC',1,4),
(13,'DD',87,4)
SELECT [NameID] , ISNULL([AA], '') AS [AA], ISNULL([BB], '') AS [BB]
, ISNULL([CC], '') AS [CC], ISNULL([DD], '') AS [DD]
FROM
(
SELECT NAME, DATA, NAMEID
FROM #Table1
)q
PIVOT
(
SUM(DATA)
FOR NAME
IN ([AA], [BB], [CC], [DD])
)P
Result Set
NameID AA BB CC DD
1 14 42 64 61
2 15 35 67 63
3 9 10
4 83 1 87

Update all records using a function sql

I am looking to update a calculated sum in sql
Basically I have a table:
ImportID SeiralNumber Day Hour value Difference Complete
1 123 1 1 6 NULL 0
2 123 1 2 8 NULL 0
3 123 1 5 21 NULL 0
4 123 1 6 28 NULL 0
5 222 2 2 12 NULL 0
6 222 2 5 18 NULL 0
7 222 2 4 16 NULL 0
8 222 1 12 8 NULL 0
For each serial number there will be a day 1-365 and hour through 1-12, all I want to do is calculate the difference filed from the record before
So take ImportID 6, I need to get the record which is on the same day and the hour before (importID 7) then I need to update the Difference using the value field which is 18 -17 = 1.
N.B. There may be gaps in the sequence and if there is no previous record then the difference should stay as NULL. Once they have been calculated they need to be inserted into a new table only when the difference is now not null and it doesn't exist in the table already, on a successful insert they get marked as complete. Also a record before can be a previous day (day 1 hour 12) is the record before (day 2, hour 1)
Currently I am using a loop to select the null values, get the previous record, update the record, if its OK insert into other table, update the Completed field.
My issue is that this is working on a million records and it is taking a long while to Select the applicable records (completed = 0) into a temp table and loop through each.
Is there any quicker way to mass process these as an update statement? Or separate statements?
The result should be
ImportID SeiralNumber Day Hour value Difference Complete
1 123 1 1 6 NULL 0
2 123 1 2 8 2 1
3 123 1 5 21 NULL 0
4 123 1 6 28 7 1
5 222 2 1 12 4 1
6 222 2 5 18 2 1
7 222 2 4 16 NULL 0
8 222 1 12 8 NULL 0
Thanks in advance
I think this is basically it isn't it?
DECLARE #TABLE TABLE
(
ImportId INT,
SerialNumber INT,
Day INT,
Hour INT,
Value INT,
Difference INT,
Complete INT
)
INSERT INTO #TABLE VALUES
(1,123,1,1,6,NULL,0),
(2,123,1,2,8,NULL,0),
(3,123,1,5,21,NULL,0),
(4,123,1,6,28,NULL,0),
(5,222,2,1,12,NULL,0),
(6,222,2,5,18,NULL,0),
(7,222,2,4,16,NULL,0),
(8,222,1,12,8,NULL,0)
SELECT * FROM #Table
UPDATE T
SET T.Difference = T.Value - TT.Value,
Complete = 1
FROM
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY SerialNumber ORDER BY Day ASC, Hour ASC) AS RowCounter
FROM #TABLE
WHERE Complete = 0 --Ignore completed ones
)AS T
INNER JOIN
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY SerialNumber ORDER BY Day ASC, Hour ASC) AS RowCounter
FROM #TABLE
)AS TT
ON T.SerialNumber = TT.SerialNumber
WHERE
(
T.RowCounter = TT.RowCounter + 1
AND
T.Day = TT.Day
AND
T.Hour = TT.Hour + 1
)
OR
(
T.Day = TT.Day + 1
AND
T.Hour = 1
AND
TT.Hour = 12
)
SELECT * FROM #TABLE

Categories

Resources