Sort several duplicate values in DataTable to one row

Sort several duplicate values in DataTable to one row - c#

I've imported a DataTable from a SQL Database using SqlDataAdapter and Fill-Method.
My datatable looks like this:
Timestamp(unix time) | Value
x | 10
x | 42
x | 643
y | 5
y | 9
y | 70
...and so on. The table contains a lot of values (1000+) but has always three rows with the same timestamp.
Now I want it to look like this:
Timestamp(unix time) | Value 1 | Value 2 | Value 3
x | 10 | 42 | 643
y | 5 | 9 | 70
How can I sort it this way?
(If there are more than three values, the programm should just insert the first three values it has found)
Thanks for any help!

Thanks for your approach! I solved it myself now.
This is how I've done it:
var grouped = from myRow in myDataTable.AsEnumerable()
group myRow by myRow.Field<int>("TIMESTAMP");
foreach (var timestamp in grouped)
{
string[] myRow = new string[5];
myRow[0] = timestamp.Key.ToString();
int i = 1;
foreach (var value in timestamp)
{
myRow[i] = value.Field<double>("VALUE").ToString();
i++;
if (i > 4)
break;
}
mySortedTable.Rows.Add(myRow);
}

I think this may also be solvable in SQL, but if you want to do it programmatically, I have tested the following in LinqPad:
void Main()
{
var list = new List<Tuple<string,int>> {
Tuple.Create("x", 10),
Tuple.Create("x", 42),
Tuple.Create("x", 643),
Tuple.Create("y", 5),
Tuple.Create("y", 9),
Tuple.Create("y", 70),
};
var result =
from grp in list.GroupBy(t => t.Item1)
let firstThree = grp.Select(t => t.Item2).Take(3).ToList()
select new {
Key = grp.Key,
Value1 = firstThree[0],
Value2 = firstThree[1],
Value3 = firstThree[2] };
foreach (var item in result)
Console.WriteLine(item);
}
It assumes that you have at least three elements, otherwise you'll get an out of range exception.
While the end result is an anonymous type, you could easily pipe the results of the operation into a DataRow instead.

Related

LINQ OrderBy only on second property

I have a list like
| column1 | column2 |
| 1 | 72 |
| 2 | 30 |
| 3 | 27 |
| 3 | 38 |
| 4 | 72 |
As you can see, the list is already sorted on column1, my goal here is to perform an OrderByDescending on the second column only on equals columns 1. Basically, I want
| column1 | column2 |
| 1 | 72 |
| 2 | 30 |
| 3 | 38 |
| 3 | 27 |
| 4 | 72 |
I can't re-run the first OrderBy (that'll be hard to explain, we don't care, I just cant :D) so forget about
list.OrderBy(e => e.column1).ThenByDescending(e => e.column2)
In fact, I wouldn't have any problem if I could simply do a .ThenByDescending(e => e.column2) without having to do the .OrderBy (maybe I can run an "empty" OrderBy that won't change the sort ? Then I would be able to do the ThenByDescending ?)

list.OrderBy(e => e.column1).ThenByDescending(e => e.column2) is still the way to go.
The algorithm has to know it has to sort on e.column1 first, no matter if it actually changes something or not. It has to know it only has to sort e.column2 within the subset of the first sorting statement. You can't do that with 'just' sorting on column2.

list.GroupBy(i => i.column1).SelectMany(i => i.OrderByDescending(g => g.column2))
Will work with many providers, but some may not preserve the ordering in the GroupBy. In such a case:
list.AsEnumerable().GroupBy(i => i.column1).SelectMany(i => i.OrderByDescending(g => g.column2))
Will work by forcing the operation into memory (where the ordering is preserved by GroupBy), though with the disadvantage of all subsequent operations being done in-memory rather than on a DB etc.

If you really DO have a list rather then an IEnumerable, it is actually possible to do this using the overload of List.Sort() which lets you specify a subset of items to sort, along with a comparer.
What you have to do is an O(N) traversal of the list to determine where each subgroup occurs according to the already-sorted column. Then sort each subgroup of more than one item according to the secondary sort column.
There's a small amount of fiddlyness involved with selecting the keys with which to identify the subgroups and the keys with which to sort the subgroups.
Here's the implementation:
public static void SortSubgroupsBy<T>
(
List<T> items,
Func<T, T, bool> sortedColumnComparer, // Used to compare the already-sorted column.
Func<T, T, int> unsortedColumnComparer // Used to compare the unsorted column.
)
{
var unsortedComparer = Comparer<T>.Create(
(x, y) => unsortedColumnComparer(x, y));
for (int i = 0; i < items.Count; ++i)
{
int j = i + 1;
while (j < items.Count && sortedColumnComparer(items[i], items[j]))
++j;
if ((j - i) > 1)
items.Sort(i, j-i, unsortedComparer);
}
}
Here's a complete demonstration in a Console app:
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
class Item
{
public Item(int column1, int column2)
{
Column1 = column1;
Column2 = column2;
}
public int Column1;
public int Column2;
public override string ToString()
{
return $"[{Column1}, {Column2}]";
}
}
class Program
{
static void Main()
{
List<Item> items = new List<Item>
{
new Item(1, 72),
new Item(2, 29),
new Item(2, 30),
new Item(3, 27),
new Item(3, 38),
new Item(3, 53),
new Item(4, 72),
new Item(4, 21),
new Item(4, 86),
new Item(4, 17),
new Item(5, 90)
};
SortSubgroupsBy(
items,
(x, y) => x.Column1 == y.Column1, // Compare sorted column.
(x, y) => y.Column2 - x.Column2); // Compare unsorted column.
Console.WriteLine(string.Join("\n", items));
}
public static void SortSubgroupsBy<T>
(
List<T> items,
Func<T, T, bool> sortedColumnComparer, // Used to compare the already-sorted column.
Func<T, T, int> unsortedColumnComparer // Used to compare the unsorted column.
)
{
var unsortedComparer = Comparer<T>.Create(
(x, y) => unsortedColumnComparer(x, y));
for (int i = 0; i < items.Count; ++i)
{
int j = i + 1;
while (j < items.Count && sortedColumnComparer(items[i], items[j]))
++j;
if ((j - i) > 1)
items.Sort(i, j-i, unsortedComparer);
}
}
}
}

Improve SQL query to calculate timespan between two consecutive rows

So... I have a a table like this:
RowID | DocID | Time | DepartmentID
1 | 1001 | 2015-11-20 | 1
2 | 1001 | 2015-11-21 | 2
3 | 1002 | 2015-11-20 | 1
4 | 1001 | 2015-11-25 | 1
5 | 1002 | 2015-11-22 | 3
6 | 1002 | 2015-11-30 | 1
My goal is to get the time in days a department spends with a document before sending it to another department.
I successfully achieved this by passing above table from SQL to a datatable in C#. Then get list of DocsID, and iterate throw each item in that list filtering the datatable with DocID and only then calculate the time between consecutive rows.
So the final result looks like:
DepartmentID | DocID | Time (Days)
1 | 1001 | 2
2 | 1001 | 5
1 | 1002 | 3
3 | 1002 | 9
The problem is this function in C# is taking about 30 seconds to get this results, so I'm looking for ways to improve it.
Is it possible to get this throw SQL only without making anything in C#?
My C# function (dt is a datatable with first table):
List<Int32> listDocIDs = new List<Int32>();
foreach (DataRow dr in dt.Rows)
{
int str = Convert.ToInt32(dr["DocID"].ToString());
if (!listDocIDs.Contains(str))
listDocIDs.Add(str);
}
DataTable times = new DataTable();
times.Columns.AddRange(new DataColumn[3] { new DataColumn("DepartmentID", typeof(Int32)),
new DataColumn("DocID",typeof(Int32)),
new DataColumn("Days",typeof(Int32)) });
foreach (int DocID in listDocIDs)
{
DataTable DocID_times = new DataTable();
using (SqlConnection conn = new SqlConnection(strCon))
{
conn.Open();
SqlDataAdapter adapter = new SqlDataAdapter("getRecordsByDocID", conn);
adapter.SelectCommand.Parameters.Add("#DocID", SqlDbType.Int).Value = DocID;
adapter.SelectCommand.CommandType = CommandType.StoredProcedure;
adapter.Fill(DocID_times);
conn.Close();
}
int j = 0;
for (int i = 0; i < DocID_times.Rows.Count; i++)
{
j = i + 1;
if (i < (DocID_times.Rows.Count - 1))
{
DateTime tempo1 = DateTime.ParseExact(DocID_times.Rows[i]["Time"].ToString(), "dd-MM-yyyy HH:mm:ss",
System.Globalization.CultureInfo.InvariantCulture);
DateTime tempo2 = DateTime.ParseExact(DocID_times.Rows[j]["Time"].ToString(), "dd-MM-yyyy HH:mm:ss",
System.Globalization.CultureInfo.InvariantCulture);
double mins = (tempo2 - tempo1).TotalMinutes;
TimeSpan result = TimeSpan.FromMinutes(mins);
double days = result.TotalDays;
var rows = times.Select(string.Format("DepartmentID = {0} AND DocID = {1}", DepartmentID, DocID));
if (rows.Length == 0)
{
// Add your Row
times.Rows.Add(DepartmentID, DocID, days);
}
else
{
// Update your Days
rows[0]["days"] = Convert.ToInt32(rows[0]["days"].ToString()) + days;
}
}
}
}

If you're listing all the rows, I would calculate the days between records inside a while loop. It can be done purely with SQL, but it won't be as good as the while loop (which can have access to two rows at a time). To be able to do it purely in SQL, you would have to join the table with itself, joining each record with the next one.
IEnumerable<MySummarizedRow> GetSummarizedRows()
{
using (var entries = GetRowsOrderedByDocIdAndRowId().GetEnumerator())
{
if (entries.MoveNext())
{
var previous = entries.Current;
while (entries.MoveNext())
{
var current = entries.Current;
if (current.DocId == previous.DocId)
yield return new MySummarizedRow(previous.DepartmentId, current.DocId, current.Time.Substract(previous.Time).TotalDays + 1);
previous = current;
}
}
}
}
This function ignores the rows for a document that hasn't been passed to another department yet. You can easily change that yielding a new row with -1 days or something like that.

Compare 2 Datatables to find difference/accuracy between the columns

So, I have 2 separate datatables, that look pretty identical but the values in their rows might be different for instance.
EDIT:
I can have an unique ID BY creating a temporary identity column that can be used as primary key if that will make it easier. so think of ID column as the primary key than.
Table A
ID | Name | Value1 | Value2 | Value3
-------------------------------------
1 | Bob | 50 | 150 | 35
2 | Bill | 55 | 47 | 98
3 | Pat | 10 | 15 | 45
4 | Cat | 70 | 150 | 35
Table B
ID | Name | Value1 | Value2 | Value3
-------------------------------------
1 | Bob | 30 | 34 | 67
2 | Bill | 55 | 47 | 98
3 | Pat | 100 | 15 | 45
4 | Cat | 70 | 100 | 20
Output Should be:
Table C
ID | Name | TableAValue1 | TableBValue1 | DiffValue1 ....Samething for Value2 .....samething for value3
------------------------------------------------------
1 | Bob | 50 | 30 | 20
2 | Bill | 55 | 55 | 0
3 | Pat | 10 | 100 | 90
4 | Cat | 70 | 70 | 0
I Know the tedious method to do this is by using a forloop and looping through each row comparing column rows with each other. But I am not sure how to create a new Table C with the results I want. Also I think there might be a simpler solution using Linq which I am not very familiar with but I would be interested in the solution with linq if it faster and less lines of code. I am looking for the most optimal/efficient way of going about this. as these datatables can be anywhere between 5,000 to 15,000+ rows in size so memory usage becomes an issue.

LINQ is not faster, at least not in general. But it can help to increase readability.
You can use Enumerable.Join which might be more efficient than nested loops, but you need a loop to fill your third table anyway. So the first two columns are the identifiers and the rest are the values:
var query = from r1 in table1.AsEnumerable()
join r2 in table2.AsEnumerable()
on new { ID = r1.Field<int>("ID"), Name = r1.Field<string>("Name") }
equals new { ID = r2.Field<int>("ID"), Name = r2.Field<string>("Name") }
select new { r1, r2 };
var columnsToCompare = table1.Columns.Cast<DataColumn>().Skip(2);
foreach (var rowInfo in query)
{
var row = table3.Rows.Add();
row.SetField("ID", rowInfo.r1.Field<int>("ID"));
row.SetField("Name", rowInfo.r1.Field<int>("Name"));
foreach (DataColumn col in columnsToCompare)
{
int val1 = rowInfo.r1.Field<int>(col.ColumnName);
int val2 = rowInfo.r2.Field<int>(col.ColumnName);
int diff = (int)Math.Abs(val1-val2);
row.SetField(col.ColumnName, diff);
}
}

var tableC = new DataTable();
tableC.Columns.Add(new DataColumn("ID"));
tableC.Columns.Add(new DataColumn("Name"));
tableC.Columns.Add(new DataColumn("TableAValue1"));
tableC.Columns.Add(new DataColumn("TableBValue1"));
tableC.Columns.Add(new DataColumn("DiffValue1"));
foreach (DataRow rowA in tableA.Rows)
{
foreach (DataRow rowB in tableB.Rows)
{
if (Convert.ToInt32(rowA["ID"]) == Convert.ToInt32(rowB["ID"]) &&
rowA["Name"].ToString() == rowB["Name"].ToString() &&
Convert.ToInt32(rowA["Value1"]) != Convert.ToInt32(rowB["Value1"]))
{
var newRow = tableC.NewRow();
newRow["ID"] = rowA["ID"];
newRow["Name"] = rowA["Name"];
newRow["TableAValue1"] = rowA["Value1"];
newRow["TableBValue1"] = rowB["Value1"];
newRow["DiffValue1"] = Convert.ToInt32(rowA["Value1"]) - Convert.ToInt32(rowB["Value1"]);
tableC.Rows.Add(newRow);
}
}
}

Using LINQ, create an anonymous type as follows
var joinedRows = (from rowA in TableA.AsEnumerable()
from rowB in TableB.AsEnumerable()
where rowA.Field<String>("Name") == rowB.Field<String>("Name")
select new
{
ID = rowA.Field<int>("ID"),
Name = rowA.Field<String>("Name"),
TableAValue1 = rowA.Field<int>("Value1"),
TableBValue1 = rowB.Field<int>("Value1"),
DiffValue1 = Math.Abs(rowA.Field<int>("Value1") - rowB.Field<int>("Value1")),
TableAValue2 = rowA.Field<int>("Value2"),
TableBValue2 = rowB.Field<int>("Value2"),
DiffValue2 = Math.Abs(rowA.Field<int>("Value2") - rowB.Field<int>("Value2")),
TableAValue3 = rowA.Field<int>("Value3"),
TableBValue3 = rowB.Field<int>("Value3"),
DiffValue3 = Math.Abs(rowA.Field<int>("Value3") - rowB.Field<int>("Value3"))
});
Table.AsEnumerable() will give you an IEnumerable(of DataRow)
row.Field will cast it to the correct type for you
You can now use the anonymous type of joinedRows and create your new dataTable from it

This uses a strategy similar to kippermand's, but will probably perform slightly better on large sets of data by avoiding the O(n²) complexity of checking every ID against every other ID, and by reusing the values extracted from the data table:
// joining by row location
var joinedTableRows =
dt1.AsEnumerable().Zip(dt2.AsEnumerable(),
(r1, r2) => new{r1, r2});
// or, joining by ID
var joinedTableRows2 =
dt1.AsEnumerable().Join(dt2.AsEnumerable(),
r => r.Field<int>("ID"),
r => r.Field<int>("ID"),
(r1, r2) => new{r1, r2});
var result =
from row in joinedTableRows
let rowA = row.r1
let rowB = row.r2
let tableAValue1 = rowA.Field<int>("Value1")
let tableBValue1 = rowB.Field<int>("Value1")
let tableAValue2 = rowA.Field<int>("Value2")
let tableBValue2 = rowB.Field<int>("Value2")
let tableAValue3 = rowA.Field<int>("Value3")
let tableBValue3 = rowB.Field<int>("Value3")
select new
{
ID = row.r1.Field<int>("ID"),
Name = row.r1.Field<string>("Name"),
TableAValue1 = tableAValue1,
TableBValue1 = tableBValue1,
DiffValue1 = Math.Abs(tableAValue1 - tableBValue1),
TableAValue2 = tableAValue2,
TableBValue2 = tableBValue2,
DiffValue2 = Math.Abs(tableAValue2 - tableBValue2),
TableAValue3 = tableAValue3,
TableBValue3 = tableBValue3,
DiffValue3 = Math.Abs(tableAValue3 - tableBValue3)
};
Depending on how your data needs to be consumed, you could either declare a class matching this anonymous type, and consume that directly (which is what I'd prefer), or you can create a DataTable from these objects, if you have to.

Check occurrence of word appearing in datatable column

I have the data below in a datatable this is example data. I would like get the occurrence of 12,13 in the datatable as normally there would be 10-20 million row in the datatable.
Customer | quantity | Product | Code
1 | 3 | Product | 12
2 | 4 | Product | 13
3 | 1 | Product | 12
4 | 6 | Product | 13

how about simple for each loop
private int getCount(int yourSearchDigit)
{
int counter = 0;
foreach (DataRow dr in youDataTable.Rows)
{
if (Convert.ToInt32(dr["Code"]) == yourSearchDigit)
counter++;
}
return counter;
}

You can use Linq-To-DataTable:
int[] allowedCodes = new []{ 12, 13 };
var rows = table.AsEnumerable()
.Where(r => allowedCodes.Contains(r.Field<int>("Code")));
However, if you have 10-20 million row in the datatable you should consider to do the filtering in the database itself.
If you want to know the number they occur:
int count = table.AsEnumerable()
.Count(r => allowedCodes.Contains(r.Field<int>("Code")));

How to change specific column data in the datatable according to some condition

I have the following case :
I add row by row to Datatable dtItems according to the user data entry through a button .
One of the columns in my data table is Hours , and i wanna to achieve the following conditions :
1- for each user the total hours is less than or equal 5.
2- the default :if the user enter one row then hours = 5
if he enters two rows then make the first one 4 and the second one is 1
if he enters three rows then make the first one is 3 and the second is 1 and the third is 1.
etc.
3-the maximum number of rows for each user is 5.
LIKE this:
user_id | name | hours
323 | jo | 3
323 | jo | 1
323 | jo | 1
324 | jack | 4
324 | jack | 1
DataTable dtItems = GetDataTable();
DataRow dr = dtItems.NewRow();
dr["emp_num"] = txt_EmpNum.Text.Trim();
dr["name"] = txt_EmpName.Text.Trim();
dr["hours"] = 5;
dtItems.Rows.Add(dr);
GV_Employee.DataSource = dtItems;
GV_Employee.DataBind();
Session["ItemDT"] = dtItems;

I assume that you don't know how to change the DataRow's Hour fields accordingly before you insert them into database.
The only what you need to know is the new hour of the first DataRow, the others get 1 hour:
var firstHour = 5 + 1 - dtItems.Rows.Count; //where 5 is your MaxCount
for (var i = 0; i < dtItems.Rows.Count; i++) {
if (i == 0)
dtItems.Rows[i]["hours"] = firstHour;
else
dtItems.Rows[i]["hours"] = 1;
}
To prevent users from inserting more than 5 rows, you only need to check for dtItems.Rows.Count < 5 before you insert the new.
Edit: If you need it to be calculated for every emp_num in the DataTable as commented:
var q = from r in dtItems.AsEnumerable()
group r by r["emp_num"];
foreach(var empGrp in q){
var rows=empGrp.ToList();
var firstHour = 5 + 1 - rows.Count;
for (var i = 0; i < rows.Count; i++){
if (i == 0)
rows[i]["hours"] = firstHour;
else
rows[i]["hours"] = 1;
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Sort several duplicate values in DataTable to one row - c#

Related

LINQ OrderBy only on second property

Improve SQL query to calculate timespan between two consecutive rows

Compare 2 Datatables to find difference/accuracy between the columns

Check occurrence of word appearing in datatable column

How to change specific column data in the datatable according to some condition

Categories

Resources