Check occurrence of word appearing in datatable column

Check occurrence of word appearing in datatable column - c#

I have the data below in a datatable this is example data. I would like get the occurrence of 12,13 in the datatable as normally there would be 10-20 million row in the datatable.
Customer | quantity | Product | Code
1 | 3 | Product | 12
2 | 4 | Product | 13
3 | 1 | Product | 12
4 | 6 | Product | 13

how about simple for each loop
private int getCount(int yourSearchDigit)
{
int counter = 0;
foreach (DataRow dr in youDataTable.Rows)
{
if (Convert.ToInt32(dr["Code"]) == yourSearchDigit)
counter++;
}
return counter;
}

You can use Linq-To-DataTable:
int[] allowedCodes = new []{ 12, 13 };
var rows = table.AsEnumerable()
.Where(r => allowedCodes.Contains(r.Field<int>("Code")));
However, if you have 10-20 million row in the datatable you should consider to do the filtering in the database itself.
If you want to know the number they occur:
int count = table.AsEnumerable()
.Count(r => allowedCodes.Contains(r.Field<int>("Code")));

Related

Copy row from datatable to another where there are common column headers

I have two datatables, I am trying to copy row from one table to another, I have tried this. the thing is that my tables are not exactly the same, both tables have common headers, but to the second table have more columns, therefore I need "smart" copy, i.e to copy the row according to the column header name.
d1:
+--------+--------+--------+
| ID | aaa | bbb |
+--------+--------+--------+
| 23 | value1 | value2 | <----copy this row
d2:
+--------+--------+--------+--------+
| ID | ccc | bbb | aaa |
+--------+--------+--------+--------+
| 23 | | value2 | value1 | <----I need this result
but this code:
string rowID=23;
DataRow[] result = dt1.Select($"ID = {rowID}");
dt2.Rows.Add(result[0].ItemArray);
gives:
d2:
+--------+--------+--------+--------+
| ID | ccc | bbb | aaa |
+--------+--------+--------+--------+
| 23 | value1 | value2 | | <---- :( NOT what I need

I think this is your homework, but here you have some simple and not very smart solution:
private DataTable DTCopySample()
{
int cnt = 0;
DataTable dt1 = new DataTable();
dt1.Columns.Add("ID");
dt1.Columns.Add("aaa");
dt1.Columns.Add("bbb");
DataTable dt2 = new DataTable();
dt2.Columns.Add("ID");
dt2.Columns.Add("ccc");
dt2.Columns.Add("bbb");
dt2.Columns.Add("aaa");
dt1.Rows.Add();
dt1.Rows[0]["ID"] = "23";
dt1.Rows[0]["aaa"] = "val1";
dt1.Rows[0]["bbb"] = "val2";
dt1.Rows.Add();
dt1.Rows[1]["ID"] = "99";
dt1.Rows[1]["aaa"] = "val99";
dt1.Rows[1]["bbb"] = "val98";
string colName = string.Empty;
foreach (DataRow row in dt1.Rows)
{
dt2.Rows.Add();
foreach (DataColumn col in dt1.Columns)
{
dt2.Rows[cnt][col.ColumnName] = row[col.ColumnName].ToString();
}
cnt++;
}
return dt2;
}
There are more smart and better solutions, but this is fast-written (2 mins) and works.
Remeber, that you have not specified columns datatypes or anything else, so I assumed there are strings everywhere for creating simple sample.

Improve SQL query to calculate timespan between two consecutive rows

So... I have a a table like this:
RowID | DocID | Time | DepartmentID
1 | 1001 | 2015-11-20 | 1
2 | 1001 | 2015-11-21 | 2
3 | 1002 | 2015-11-20 | 1
4 | 1001 | 2015-11-25 | 1
5 | 1002 | 2015-11-22 | 3
6 | 1002 | 2015-11-30 | 1
My goal is to get the time in days a department spends with a document before sending it to another department.
I successfully achieved this by passing above table from SQL to a datatable in C#. Then get list of DocsID, and iterate throw each item in that list filtering the datatable with DocID and only then calculate the time between consecutive rows.
So the final result looks like:
DepartmentID | DocID | Time (Days)
1 | 1001 | 2
2 | 1001 | 5
1 | 1002 | 3
3 | 1002 | 9
The problem is this function in C# is taking about 30 seconds to get this results, so I'm looking for ways to improve it.
Is it possible to get this throw SQL only without making anything in C#?
My C# function (dt is a datatable with first table):
List<Int32> listDocIDs = new List<Int32>();
foreach (DataRow dr in dt.Rows)
{
int str = Convert.ToInt32(dr["DocID"].ToString());
if (!listDocIDs.Contains(str))
listDocIDs.Add(str);
}
DataTable times = new DataTable();
times.Columns.AddRange(new DataColumn[3] { new DataColumn("DepartmentID", typeof(Int32)),
new DataColumn("DocID",typeof(Int32)),
new DataColumn("Days",typeof(Int32)) });
foreach (int DocID in listDocIDs)
{
DataTable DocID_times = new DataTable();
using (SqlConnection conn = new SqlConnection(strCon))
{
conn.Open();
SqlDataAdapter adapter = new SqlDataAdapter("getRecordsByDocID", conn);
adapter.SelectCommand.Parameters.Add("#DocID", SqlDbType.Int).Value = DocID;
adapter.SelectCommand.CommandType = CommandType.StoredProcedure;
adapter.Fill(DocID_times);
conn.Close();
}
int j = 0;
for (int i = 0; i < DocID_times.Rows.Count; i++)
{
j = i + 1;
if (i < (DocID_times.Rows.Count - 1))
{
DateTime tempo1 = DateTime.ParseExact(DocID_times.Rows[i]["Time"].ToString(), "dd-MM-yyyy HH:mm:ss",
System.Globalization.CultureInfo.InvariantCulture);
DateTime tempo2 = DateTime.ParseExact(DocID_times.Rows[j]["Time"].ToString(), "dd-MM-yyyy HH:mm:ss",
System.Globalization.CultureInfo.InvariantCulture);
double mins = (tempo2 - tempo1).TotalMinutes;
TimeSpan result = TimeSpan.FromMinutes(mins);
double days = result.TotalDays;
var rows = times.Select(string.Format("DepartmentID = {0} AND DocID = {1}", DepartmentID, DocID));
if (rows.Length == 0)
{
// Add your Row
times.Rows.Add(DepartmentID, DocID, days);
}
else
{
// Update your Days
rows[0]["days"] = Convert.ToInt32(rows[0]["days"].ToString()) + days;
}
}
}
}

If you're listing all the rows, I would calculate the days between records inside a while loop. It can be done purely with SQL, but it won't be as good as the while loop (which can have access to two rows at a time). To be able to do it purely in SQL, you would have to join the table with itself, joining each record with the next one.
IEnumerable<MySummarizedRow> GetSummarizedRows()
{
using (var entries = GetRowsOrderedByDocIdAndRowId().GetEnumerator())
{
if (entries.MoveNext())
{
var previous = entries.Current;
while (entries.MoveNext())
{
var current = entries.Current;
if (current.DocId == previous.DocId)
yield return new MySummarizedRow(previous.DepartmentId, current.DocId, current.Time.Substract(previous.Time).TotalDays + 1);
previous = current;
}
}
}
}
This function ignores the rows for a document that hasn't been passed to another department yet. You can easily change that yielding a new row with -1 days or something like that.

Compare 2 Datatables to find difference/accuracy between the columns

So, I have 2 separate datatables, that look pretty identical but the values in their rows might be different for instance.
EDIT:
I can have an unique ID BY creating a temporary identity column that can be used as primary key if that will make it easier. so think of ID column as the primary key than.
Table A
ID | Name | Value1 | Value2 | Value3
-------------------------------------
1 | Bob | 50 | 150 | 35
2 | Bill | 55 | 47 | 98
3 | Pat | 10 | 15 | 45
4 | Cat | 70 | 150 | 35
Table B
ID | Name | Value1 | Value2 | Value3
-------------------------------------
1 | Bob | 30 | 34 | 67
2 | Bill | 55 | 47 | 98
3 | Pat | 100 | 15 | 45
4 | Cat | 70 | 100 | 20
Output Should be:
Table C
ID | Name | TableAValue1 | TableBValue1 | DiffValue1 ....Samething for Value2 .....samething for value3
------------------------------------------------------
1 | Bob | 50 | 30 | 20
2 | Bill | 55 | 55 | 0
3 | Pat | 10 | 100 | 90
4 | Cat | 70 | 70 | 0
I Know the tedious method to do this is by using a forloop and looping through each row comparing column rows with each other. But I am not sure how to create a new Table C with the results I want. Also I think there might be a simpler solution using Linq which I am not very familiar with but I would be interested in the solution with linq if it faster and less lines of code. I am looking for the most optimal/efficient way of going about this. as these datatables can be anywhere between 5,000 to 15,000+ rows in size so memory usage becomes an issue.

LINQ is not faster, at least not in general. But it can help to increase readability.
You can use Enumerable.Join which might be more efficient than nested loops, but you need a loop to fill your third table anyway. So the first two columns are the identifiers and the rest are the values:
var query = from r1 in table1.AsEnumerable()
join r2 in table2.AsEnumerable()
on new { ID = r1.Field<int>("ID"), Name = r1.Field<string>("Name") }
equals new { ID = r2.Field<int>("ID"), Name = r2.Field<string>("Name") }
select new { r1, r2 };
var columnsToCompare = table1.Columns.Cast<DataColumn>().Skip(2);
foreach (var rowInfo in query)
{
var row = table3.Rows.Add();
row.SetField("ID", rowInfo.r1.Field<int>("ID"));
row.SetField("Name", rowInfo.r1.Field<int>("Name"));
foreach (DataColumn col in columnsToCompare)
{
int val1 = rowInfo.r1.Field<int>(col.ColumnName);
int val2 = rowInfo.r2.Field<int>(col.ColumnName);
int diff = (int)Math.Abs(val1-val2);
row.SetField(col.ColumnName, diff);
}
}

var tableC = new DataTable();
tableC.Columns.Add(new DataColumn("ID"));
tableC.Columns.Add(new DataColumn("Name"));
tableC.Columns.Add(new DataColumn("TableAValue1"));
tableC.Columns.Add(new DataColumn("TableBValue1"));
tableC.Columns.Add(new DataColumn("DiffValue1"));
foreach (DataRow rowA in tableA.Rows)
{
foreach (DataRow rowB in tableB.Rows)
{
if (Convert.ToInt32(rowA["ID"]) == Convert.ToInt32(rowB["ID"]) &&
rowA["Name"].ToString() == rowB["Name"].ToString() &&
Convert.ToInt32(rowA["Value1"]) != Convert.ToInt32(rowB["Value1"]))
{
var newRow = tableC.NewRow();
newRow["ID"] = rowA["ID"];
newRow["Name"] = rowA["Name"];
newRow["TableAValue1"] = rowA["Value1"];
newRow["TableBValue1"] = rowB["Value1"];
newRow["DiffValue1"] = Convert.ToInt32(rowA["Value1"]) - Convert.ToInt32(rowB["Value1"]);
tableC.Rows.Add(newRow);
}
}
}

Using LINQ, create an anonymous type as follows
var joinedRows = (from rowA in TableA.AsEnumerable()
from rowB in TableB.AsEnumerable()
where rowA.Field<String>("Name") == rowB.Field<String>("Name")
select new
{
ID = rowA.Field<int>("ID"),
Name = rowA.Field<String>("Name"),
TableAValue1 = rowA.Field<int>("Value1"),
TableBValue1 = rowB.Field<int>("Value1"),
DiffValue1 = Math.Abs(rowA.Field<int>("Value1") - rowB.Field<int>("Value1")),
TableAValue2 = rowA.Field<int>("Value2"),
TableBValue2 = rowB.Field<int>("Value2"),
DiffValue2 = Math.Abs(rowA.Field<int>("Value2") - rowB.Field<int>("Value2")),
TableAValue3 = rowA.Field<int>("Value3"),
TableBValue3 = rowB.Field<int>("Value3"),
DiffValue3 = Math.Abs(rowA.Field<int>("Value3") - rowB.Field<int>("Value3"))
});
Table.AsEnumerable() will give you an IEnumerable(of DataRow)
row.Field will cast it to the correct type for you
You can now use the anonymous type of joinedRows and create your new dataTable from it

This uses a strategy similar to kippermand's, but will probably perform slightly better on large sets of data by avoiding the O(n²) complexity of checking every ID against every other ID, and by reusing the values extracted from the data table:
// joining by row location
var joinedTableRows =
dt1.AsEnumerable().Zip(dt2.AsEnumerable(),
(r1, r2) => new{r1, r2});
// or, joining by ID
var joinedTableRows2 =
dt1.AsEnumerable().Join(dt2.AsEnumerable(),
r => r.Field<int>("ID"),
r => r.Field<int>("ID"),
(r1, r2) => new{r1, r2});
var result =
from row in joinedTableRows
let rowA = row.r1
let rowB = row.r2
let tableAValue1 = rowA.Field<int>("Value1")
let tableBValue1 = rowB.Field<int>("Value1")
let tableAValue2 = rowA.Field<int>("Value2")
let tableBValue2 = rowB.Field<int>("Value2")
let tableAValue3 = rowA.Field<int>("Value3")
let tableBValue3 = rowB.Field<int>("Value3")
select new
{
ID = row.r1.Field<int>("ID"),
Name = row.r1.Field<string>("Name"),
TableAValue1 = tableAValue1,
TableBValue1 = tableBValue1,
DiffValue1 = Math.Abs(tableAValue1 - tableBValue1),
TableAValue2 = tableAValue2,
TableBValue2 = tableBValue2,
DiffValue2 = Math.Abs(tableAValue2 - tableBValue2),
TableAValue3 = tableAValue3,
TableBValue3 = tableBValue3,
DiffValue3 = Math.Abs(tableAValue3 - tableBValue3)
};
Depending on how your data needs to be consumed, you could either declare a class matching this anonymous type, and consume that directly (which is what I'd prefer), or you can create a DataTable from these objects, if you have to.

How to select rows from DataTable based on Index / Row Number?

I have a DataTable. I want to select the rows based on the Index/Row Number of the rows in DataTable.
Suppose below is the DataTable:
---------------- ---------------
| ID | Name | | Index/RowNo |
---------------- ---------------
| A001 | John | | 1 |
| A002 | Foo | | 2 |
| A003 | Rambo | | 3 |
| A004 | Andy | | 4 |
| ... | ... | | 5 |
---------------- ---------------
Now, i want to select the Rows from above shown DataTable using criteria say for example Index > 2, In that case First entry at Index 1, A001 | John, will not become part of the resultant DataTable. How can i do it efficiently?
Moreover, i want to have my result both in the form of DataTable and Linq query outcome.
I am trying to do something like this:
var result = dt.Select("RowNum > 1", "");
OR
var result = from row in dt.AsEnumerable()
where RowNum > 1
select row;

I am trying to do something like this:
var result = dt.Select("RowNum > 1", "");
You can use Enumerable.Skip even with a DataTable since it is an IEnumerable<DataRow>:
IEnumerable<DataRow> allButFirst = table.AsEnumerable().Skip(1);
get a new DataTable with:
DataTable tblAllButFirst = allButFirst.CopyToDataTable();
If your next question is how you can take only rows with given indices:
var allowedIndices = new[]{ 2, 4, 7, 8, 9, 10 };
DataTable tblAllowedRows = table.AsEnumerable()
.Where((r, i) => allowedIndices.Contains(i))
.CopyToDataTable();

var result = table.AsEnumerable()
.Where((row, index) => index > 1)
.CopyToDataTable()

Sort several duplicate values in DataTable to one row

I've imported a DataTable from a SQL Database using SqlDataAdapter and Fill-Method.
My datatable looks like this:
Timestamp(unix time) | Value
x | 10
x | 42
x | 643
y | 5
y | 9
y | 70
...and so on. The table contains a lot of values (1000+) but has always three rows with the same timestamp.
Now I want it to look like this:
Timestamp(unix time) | Value 1 | Value 2 | Value 3
x | 10 | 42 | 643
y | 5 | 9 | 70
How can I sort it this way?
(If there are more than three values, the programm should just insert the first three values it has found)
Thanks for any help!

Thanks for your approach! I solved it myself now.
This is how I've done it:
var grouped = from myRow in myDataTable.AsEnumerable()
group myRow by myRow.Field<int>("TIMESTAMP");
foreach (var timestamp in grouped)
{
string[] myRow = new string[5];
myRow[0] = timestamp.Key.ToString();
int i = 1;
foreach (var value in timestamp)
{
myRow[i] = value.Field<double>("VALUE").ToString();
i++;
if (i > 4)
break;
}
mySortedTable.Rows.Add(myRow);
}

I think this may also be solvable in SQL, but if you want to do it programmatically, I have tested the following in LinqPad:
void Main()
{
var list = new List<Tuple<string,int>> {
Tuple.Create("x", 10),
Tuple.Create("x", 42),
Tuple.Create("x", 643),
Tuple.Create("y", 5),
Tuple.Create("y", 9),
Tuple.Create("y", 70),
};
var result =
from grp in list.GroupBy(t => t.Item1)
let firstThree = grp.Select(t => t.Item2).Take(3).ToList()
select new {
Key = grp.Key,
Value1 = firstThree[0],
Value2 = firstThree[1],
Value3 = firstThree[2] };
foreach (var item in result)
Console.WriteLine(item);
}
It assumes that you have at least three elements, otherwise you'll get an out of range exception.
While the end result is an anonymous type, you could easily pipe the results of the operation into a DataRow instead.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Check occurrence of word appearing in datatable column - c#

I have the data below in a datatable this is example data. I would like get the occurrence of 12,13 in the datatable as normally there would be 10-20 million row in the datatable. Customer | quantity | Product | Code 1 | 3 | Product | 12 2 | 4 | Product | 13 3 | 1 | Product | 12 4 | 6 | Product | 13

how about simple for each loop private int getCount(int yourSearchDigit) { int counter = 0; foreach (DataRow dr in youDataTable.Rows) { if (Convert.ToInt32(dr["Code"]) == yourSearchDigit) counter++; } return counter; }

Related

Copy row from datatable to another where there are common column headers

Improve SQL query to calculate timespan between two consecutive rows

Compare 2 Datatables to find difference/accuracy between the columns

How to select rows from DataTable based on Index / Row Number?

Sort several duplicate values in DataTable to one row

Categories

Resources