How to sum the column which is have same Items - c#

I want to sum TU which is having same items...My table is
TID ITEMS TIMES TU
1 D 5 136
1 M 5 136
1 R 14 136
4 D 2 106
4 B 6 106
4 H 1 106
5 D 1 97
5 B 6 97
5 M 6 97
7 D 4 77
7 B 2 77
How to check the similar items in the table

I would suggest something like this in SQL:
SELECT items, SUM(tu)
FROM table
GROUP BY items

Related

Uniformly distributing hash of given properties

I am trying to distribute a set of items across number of buckets. I am looking for following properties:
Bucket assignment needs to be deterministic. In different runs same
input should end up in the same bucket.
Distribution of data between buckets should be uniform.
This should work for fairly small number of inputs (e.g. if I want
to distribute 50 inputs across 25 buckets ideally each bucket will
have 2 items).
First try was to generate md5 from input data and form bucket from first bytes of md5. I am not too satisfied with uniformity. It works well when input is large but not so well for small input. E.g. distributing 100 items across 64 buckets:
List<string> l = new List<string>();
for (int i = 0; i < 100; i++)
{
l.Add(string.Format("data{0}.txt", i));
}
int[] buckets = new int[64];
var md5 = MD5.Create();
foreach (string str in l)
{
{
byte[] hash = md5.ComputeHash(Encoding.Default.GetBytes(str));
uint bucket = BitConverter.ToUInt32(hash, 0) % 64;
buckets[bucket % 64]++;
}
}
Any suggestions what could I do to achieve higher uniformity? Thanks.
Leaving aside the efficiency of using MD5 for this purpose (see the discussion here and in the marked duplicate of that question), basically the answer is that what you have is what a uniform distribution really looks like.
That might seem counter-intuitive, but it's easily demonstrable either mathematically or by experiment.
As a kind of motivating example, consider the task of choosing exactly 64 numbers in the range 0-63. The odds that you will get one per bucket are very close to 0. There are 6464 possible sequences, of which 64! contain all 64 numbers. The odds of getting one of these sequence is about one in 3.1×1026. In fact, the odds of getting a sequence in which no element appears three times is less than one in a thousand (it's about .000658). So it's almost certain that a random uniform sample of 64 numbers in the range 0-63 will have some triplets, and it's pretty likely that there will be some quadruplet. If the sample is 100 numbers, those probabilities just get even bigger.
But the maths are not so easy to compute in general, so here I chose to illustrate by experiment :-), using random.org, which is a pretty reliable source of random numbers. I asked it for 100 numbers in the range 0-63, and counted them (using bash, so my "graph" is not as pretty as yours). Here are two runs:
First run:
Random numbers:
44 17 50 11 16 4 24 29 12 36
27 32 12 63 4 30 19 60 28 39
22 40 19 16 23 2 46 31 52 41
13 2 42 17 29 39 43 9 20 50
45 40 38 33 17 45 28 6 48 12
56 26 34 33 35 40 28 44 22 10
50 55 49 43 63 62 22 50 15 52
48 54 53 26 4 53 13 56 42 60
49 30 14 55 29 62 15 13 35 40
22 38 37 36 10 36 5 41 43 53
Counts:
X X X
X XX X X XX X X X X X
X X X XX XXX X X X XXX X XX XXXXXXXX XXX XX XX X XX
X XXX XXXXXXXXX XX XXX XXXXXXXXXXXXXXXXXXXXX XXX XXXXX X XX
----------------------------------------------------------------
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
Second run:
Random numbers:
41 31 16 40 1 51 17 41 27 46
24 14 21 33 25 43 4 36 1 14
40 22 11 22 30 19 23 63 39 61
8 55 40 6 21 13 55 13 3 52
17 52 53 53 7 21 47 13 45 57
25 27 30 48 38 55 55 22 61 11
11 28 45 63 43 0 41 51 15 2
33 2 46 14 35 41 5 2 11 37
28 56 15 7 18 12 57 36 59 51
42 5 46 32 10 8 0 46 12 9
Counts:
X X X X
X X XX XX XX X X X
XXX X XX XXXXX X XX X XX X X X XX X XX XXX X X X X
XXXXXXXXXXXXXXXXXXXX XXXXX XX XXXX XXXXXXXXX XXXX XXX XXX X X X
----------------------------------------------------------------
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
You could try this with your favourite random number generator, playing around with the size of the distribution. You'll get the same sort of shape.

Calculating change in column over groups and extracting based on criteria

I am a beginner to coding in U-SQL/C#. I am stuck in a place during windowing/aggregation.
My Data looks like
Name Date OrderNo Type Balance
one 2018-06-25T04:55:44.0020987Z 1 Drink 15
one 2018-06-25T04:57:44.0020987Z 1 Drink 70
one 2018-06-25T04:59:44.0020987Z 1 Drink 33
one 2018-06-25T04:59:49.0020987Z 1 Drink 25
two 2018-06-25T04:55:44.0020987Z 2 Drink 22
two 2018-06-25T04:57:44.0020987Z 2 Drink 81
two 2018-06-25T04:58:44.0020987Z 2 Drink 33
two 2018-06-25T04:59:44.0020987Z 2 Drink 45
In U-SQL I am adding a unique id based on combinations of name, orderno and type and for the purpose of sorting, I am adding another one including the date.
#files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
#InputFile
USING new JsonExtractor();
#files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id
FROM #files;
My Data now looks like this:
Name Date OrderNo Type Balance group_id id
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2
(I have added only 4 records per group but there are multiple per group)
I am stuck at determining the difference between successive rows in the balance column in each group.
Expected Output for Part 1:
Name Date OrderNo Type Balance group_id id increase
one 2018-06-25T04:55:44.0020987Z 1 Drink 15 1 1 0
one 2018-06-25T04:57:44.0020987Z 1 Drink 70 2 1 55
one 2018-06-25T04:59:44.0020987Z 1 Drink 33 3 1 -37
one 2018-06-25T04:59:49.0020987Z 1 Drink 25 4 1 -8
two 2018-06-25T04:55:44.0020987Z 2 Drink 22 5 2 0
two 2018-06-25T04:57:44.0020987Z 2 Drink 81 6 2 59
two 2018-06-25T04:58:44.0020987Z 2 Drink 33 7 2 -48
two 2018-06-25T04:59:44.0020987Z 2 Drink 45 8 2 8
For every new group (defined by id) the increase should start from zero.
I went through stack overflow and saw the lag function from transgresql. I could not find a C# equivalent. Is that applicable in this case?
Any help is appreciated. Further clarification will be provided if required.
Update: When I use CASE WHEN my solution looks like this
CURRENT OUTPUT DESIRED OUTPUT
id Balance Increase id Balance Increase
1 15 0 1 15 0
1 70 55 1 70 55
1 33 -37 1 33 -37
1 25 -8 1 25 -8
2 22 "-3" 2 22 "0"
2 81 59 2 81 59
2 33 -48 2 33 -48
2 45 12 2 45 12
Look at the highlighted row. The increase column must start at 0 for each id.
Update: I was able to solve the first part of my question. See my answer below.
The second part that I had posted earlier was incorrectly posted. I have removed that.
You can try to use LAG window function get previous Balance in a subquery, then use where write the condition.
SELECT * FROM (
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno,date) AS group_id,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS id,
(CASE WHEN LAG(Balance) OVER(ORDER BY name,type,orderno) IS NULL THEN 0
ELSE Balance - LAG(Balance) OVER(ORDER BY name,type,orderno)
END) as increase
FROM #files
) t1
WHERE increase > 50
The query that finally worked for me was this..
#files =
EXTRACT
name string,
date DateTime,
type string,
orderno int,
balance int
FROM
#InputFile
USING new JsonExtractor();
#files2 =
SELECT *,
DENSE_RANK() OVER(ORDER BY name,type,orderno) AS group_id
FROM #files;
#files3 =
SELECT *,
DENSE_RANK() OVER(PARTITION BY group_id ORDER BY date) AS group_order
FROM #files2;
#files4 =
SELECT *,
(CASE WHEN group_order == 1 THEN 0
ELSE balance - LAG(balance) OVER(ORDER BY name,type,orderno)
END) AS increase
FROM #files3;

Is it possible to pivot a SQL Server table with multiple data colums

I have a table like this to be exported to Excel
name date value1 value2 value 3
A 09/09/2015 5 10 2
B 09/09/2015 6 6 22
C 09/09/2015 4 3 11
A 10/09/2015 15 1 2
B 10/09/2015 6 16 27
C 10/09/2015 4 31 11
A 11/09/2015 15 1 2
B 11/09/2015 6 16 27
C 11/09/2015 4 31 11
can we pivot this to something like this (using SQL or C# datatable)
09/09/2015 | 10/09/2015 | 11/09/2015
value1 value2 value3 | value1 value2 value3 | value1 value2 value3
A 5 10 12 | 15 1 2 | 15 1 2
B 6 6 22 | 6 16 27 | 6 16 27
C 4 3 11 | 4 31 11 | 4 31 11
since your exporting this to Excel why not sumarize the table in Excel into a Pivottable and from there put the name column into the row box, date column into column box, the values into the value box.
After that you can insert some nice diagrams and then you have a nicer view of the data if you ask me.

Query to select all rows with a condition

I need to make a query that select all rows belong to a ID from the datatable where Lesson from that ID has two specific values. For example ID 2 and 3 have lessons D and E that I want to take all the rows of ID 2 and ID 3.
My data is in MS Access.
MY Programming language is C#.
I already try to write some a query
"SELECT * FROM MainData WHERE ID IN ( SELECT ID FROM MainData GROUP BY ID HAVING (Lesson ='" + txtbox_startsubject.Text + "' and '" + TargetPoint + "'))";
The format of data:
ID Lesson Time Score
1 C 165 4
1 E 190 3
1 H 195 3
1 I 200 4
2 A 100 2
2 B 150 5
2 D 210 2
2 R 10 3
2 E 110 4
3 D 130 5
3 E 190 5
3 H 210 4
3 I 160 4
3 J 110 4
4 E 120 3
4 H 150 4
4 J 170 4

Fastest way of reading txt files in C#

I'm working with a project and I'm a little bit confused. I've got from my teacher some txt files (from his site files: wt40.txt, wt50.txt, wt100.txt).
Every file structure looks similiar:
26 24 79 46 32 35 73 74 14 67 86 46 78 40 29 94 64 27 90 55
35 52 36 69 85 95 14 78 37 86 44 28 39 12 30 68 70 9 49 50
1 10 9 10 10 4 3 2 10 3 7 3 1 3 10 4 7 7 4 7
5 3 5 4 9 5 2 8 10 4 7 4 9 5 7 7 5 10 1 3
Every number has 6 chars, but instead of leading zeros there are
spaces
At every line there are 20 numbers
File wt40.txt should be read as: first two lines to first List, next two lines to next List and third pair of lines to the third list. Next lines again should be put in pairs to those Lists.
In C++ I'm doing it in this simple way:
for(int ins=0; ins<125; ins++) //125 instances in file
{
for(int i=0; i<N; i++) file>>tasks[i].p; //N elements at two first lines
for(int i=0; i<N; i++) file>>tasks[i].w;
for(int i=0; i<N; i++) file>>tasks[i].d;
tasks[i].putToLists();
}
But when I'm writing this in C# I have to open StreamReader, read every line, split it by regexp, cast them to int and add to lists. That's a lot of loops.
I cannot read every 6 chars and add them in three loops because those text files have messed up end of lines chars - sometimes it's just '\n' sometimes something more.
Isn't there any more simple way?
There is essentially a 20 by n table of 6 digit(character) numbers with leading spaces.
26 24 79 46 32 35 73 74 14 67 86 46 78 40 29 94 64 27 90 55
35 52 36 69 85 95 14 78 37 86 44 28 39 12 30 68 70 9 49 50
1 10 9 10 10 4 3 2 10 3 7 3 1 3 10 4 7 7 4 7
5 3 5 4 9 5 2 8 10 4 7 4 9 5 7 7 5 10 1 3
I don't understand the last sentence:
File wt40.txt should be read as: first two lines to first List, next
two lines to next List and third pair of lines to the third list. Next
lines again should be put in pairs to those Lists.
Say you want to get the first 6 rows and create 3 lists each with 2 rows, you do could something like:
It is eager in that it reads everything into memory and then does its work.
const int maxNumberDigitLength = 6;
const int rowLengthInChars = maxNumberDigitLength * 20;
const int totalNumberOfCharsToRead = rowLengthInChars * maxNumberDigitLength;
char[] buffer = new char[totalNumberOfCharsToRead];
using (StreamReader reader = new StreamReader("wt40.txt")
{
int numberOfCharsRead = reader.Read(buffer, 0, totalNumberOfCharsToRead);
}
// put them in your lists
IEnumerable<char> l1 = buffer.Take(rowLengthInChars);
IEnumerable<char> l2 = buffer.Skip(rowLengthInChars).Take(rowLengthInChars);
IEnumerable<char> l3 = buffer.Skip(rowLengthInChars*2).Take(rowLengthInChars);
// Get the list of strings from the list of chars using non LINQ method.
List<string> list1 = new List<string>();
int i = 0;
StringBuilder sb = new StringBuilder();
foreach(char c in l1)
{
if(i < maxNumberDigitLength)
{
sb.Append(c);
i++;
}
i = 0;
list1.Add(sb.ToString());
}
// LINQ method
string s = string.Concat(l1);
List<string> list1 = Enumerable
.Range(0, s.Length / maxNumberDigitLength)
.Select(i => s.Substring(i * maxNumberDigitLength, maxNumberDigitLength))
.ToList();
// Parse to ints using LINQ projection
List<int> numbers1 = list1.Select(int.Parse);
List<int> numbers2 = list2.Select(int.Parse);
List<int> numbers3 = list3.Select(int.Parse);
Isn't there any more simple way?
Don't know if it's simpler but there is only one loop and a bit of LINQ:
List<List<int>> lists = new List<List<int>>();
using (StreamReader reader = new StreamReader("wt40.txt"))
{
string line;
int count = 0;
while ((line = reader.ReadLine()) != null)
{
List<int> currentList =
Regex.Split(line, "\\s")
.Where(s => !string.IsNullOrWhiteSpace(s))
.Select(int.Parse).ToList();
if (currentList.Count > 0) // skip empty lines
{
if (count % 2 == 0) // append each second list to the previous one
{
lists.Add(currentList);
}
else
{
lists[count / 2].AddRange(currentList);
}
}
count++;
}
}
In total you end up with 375 lists each containing 40 numbers (at least for wt40.txt input).

Categories

Resources