Remove rows from a DataTable where an entry exist in another DataTable

Remove rows from a DataTable where an entry exist in another DataTable - c#

Sorry about the confusing subject line :)
I want to make a SQLlike query with my DataTable:s: I want to do something like this
// Is named "BadValues" Rows contain: id1, id2
DataTable tableReadFromFile = readFromFile();
// Is named "AllValues" Rows contain id1, id2
DataTable tableReadFromSql = readFromSql
DataTable resultTable =
tableReadFromFile.select("where AllValues.id1 not in (select id1 from BadValues) and AllValues.id2 not in (select id2 from BadValues)");
So if my "BadValues" table would look like this:
id1 id2
0 1
10 11
20 21
and my "AllValues" table would look like this:
id1 id2
0 1
0 2
1 1
10 11
10 12
12 11
20 21
20 22
22 21
I would like the resultTable to look like this:
id1 id2
0 2
1 1
10 12
12 11
20 22
22 21
In other words: if the pair id1,id2 exists in the table "BadValues" and in "AllValues" I want to remove them so that they don't exist in the result table.
This would have been rather simple to do in SQL if the table "BadValues" would exist in the SQL database, but since it is loaded from file that is not possible.
As it is now, I loop through all rows in the "BadValues" and construct individual SQL queries with the id1 and id2 values set. Since I have quite a lot of data, that is very time consuming.
Any tip is appreciated!

I think this will do it:
DataTable tblBadValues; // filled however
DataTable tblAllValues; // filled however
tblBadValues.Merge(tblAllValues); // this will add to tblBadValues all records
// that aren't already in there
DataTable tblResults = tblBadValues.getChanges(); // this will get the records
// that were just added by the merge, meaning it will return all the records
// that were originally in tblAllValues that weren't also in tblBadValues
tblBadValues.RejectChanges(); // in case you need to re-use tblBadValues

Using Linq to dataset:
var badValues = new HashSet<Tuple<int, int>>(
tableReadFromFile.AsEnumerable().
Select(row =>
new Tuple<int, int>(row.Field<int>("id1"), row.Field<int>("id2"))));
var result = tableReadFromSql.AsEnumerable().
Where(row => !(badValues.Contains(
new Tuple<int, int>(row.Field<int>("id1"), row.Field<int>("id2")))));
The first statement basically creates a hashset of the tuples which represent the bad values.
The second searches in the second table the rows which ids are not in the hashset.

I have an idea, although you would have to do LINQ to SQL.
var query = from data in AllObjects
select data;
foreach (DataObject o in BadData)
{
DataObject temp = o;
query = query.Where(x => !((x.id1 == temp.id1) && (x.id2 == temp.id2)));
}
//query now contains the expression to get only good rows.
Only when query gets iterated (or .ToArray etc.) does it execute a call to you database server.

Related

Select distinct records from database with condition using Entity Framework

I have table of 100 records, in some records the first and last column are the same. I need distinct records on the basis of first and last columns.
For example if first and last columns in 5 rows are the same, then make it distinct and return only one record from the database. I hope you understand my question.
The table looks like this:
FirstField 2ndField Last Field
----------------------------------
a dd 10
a dd 20
b ff 50
a gg 10
a ng 10
DB.Information.Distinct().ToList();
Expected output:
a dd 10
a dd 20
b ff 50

Try as follows:
var distinctRecords = DB.Information.GroupBy(i => new {i.FirstField, i.LastField})
.Select(g => g.FirstOrDefault()).ToList();

you can also distinct with multiple columns using Distinct()
var informations = DB.Information.Select(x=> new { x.FirstField , x.LastField }).Distinct().ToList();
here first we select columns as anonymous-type which you want to distinct
like this you can distinct with multiple column without GroupBy in linq.

how to get records from db having one common column values with another list?

i have a list
int ListA={23,25,41,69,20,22,30);
and my db table contain column jobid(int) having values like
{20,61,55,14,21,12,0,11)etc
also it has a column for companyId(int)
Basic structure like
jobid companyId
12 451
22 122
30 365 ...
I want to get those company ids from table whose job id exist in listA
i tried
db.mqJobMasters.Where(e=>e.jobId==newlist2).Select(x=>x.jobCompanyId).ToList();
but it gives error == connot applied to int to list

You can try something like
var result = db.mqJobMasters.Where(e=> listA.Contains(e.JobId)).Select(x=>x.jobCompanyId).ToList();

LINQ join to return dynamic column list

I'm looking for a way to return a dynamic column list from a LINQ join of two datatables.
First, this is not a duplicate. I have already studied and discarded:
C# LINQ list select columns dynamically from a joined dataset
Creating a LINQ select from multiple tables
How to do a LINQ join that behaves exactly like a physical database inner join?
(and many others)
Here is my starting point:
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
DataTable result = ( from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
[...I NEED HELP HERE with the SELECT....]).CopyToDataTable();
return result;
}
A few notes and requirements:
There is no database engine. The data sources are large CSV files (500K+ records) being read into c# DataTables.
Because the CSVs are large, looping through each record in the join is a bad solution for performance reasons. I've already tried record looping and it's just too slow. I get great performance on the join above, but I can't find a way to have it return just the columns I want (specified by the caller) without looping records.
If I need to loop over columns in the join, that is perfectly fine, I just don't want to loop rows.
I want to be able to pass in an array of column names and return just those columns in the resulting DataTable. If both datatables being passed in happen to have a column named the same, and if that column is in my array of column names, just pass back either column because the data will be the same between the 2 columns in that case.
If I need to pass in 2 arrays (1 for each datatable's desired columns) that's fine, but 1 array of column names would be ideal.
The column list cannot be static and hardcoded into the function. The reason is because my JoinDataTables() is called from many different places in my system in order to join a wide variety of CSVs-turned-datatables, and each CSV file has very different columns.
I don't want all columns returned in the resulting DataTable -- just the columns I specify in the columns array.
So suppose, before calling JoinDataTables(), I have the following 2 datatables:
Table: T1
T1A T1B T1C T1D
==================
10 AA H1 Foo1
11 AB H1 Foo2
12 AA H2 Foo1
13 AB H2 Foo2
Table: T2
T2A T2X T2Y T2Z
==================
12 N1 O1 Yeah1
17 N2 O2 Yeah2
18 N3 O1 Yeah1
19 N4 O2 Yeah2
Now suppose we join these 2 tables like so:
ON T1.T1A = T2.T2A
select * from [join]
and that yields this resultset:
T1A T1B T1C T1D T2A T2X T2Y T2Z
====================================
12 AA H2 Foo1 12 N1 O1 Yeah1
Notice that only 1 row is yielded by the join.
Now to the crux of my question. Suppose that for a given use case, I want to return only 4 columns from this join: T1A, T1D, T2A, and T2Y. So my resultset would then look like this:
T1A T1D T2A T2Y
==================
12 Foo1 12 O1
I'd like to be able to call my JoinDataTables function like so:
DataTable dt = JoinDataTables(dt1, dt2, "T1A", "T2A", new string[] {"T1A", "T1D", "T2A", "T2Y"});
Keeping in mind performance and the fact that I don't want to loop through records (because it's slow for large sets of data), how can this be accomplished? (The join is already working well, now I just need a correct select segment (whether via new{..} or whatever you think)).
I cannot accept a solution with a hardcoded column list inside the function. I have found examples of that approach all over SO.
Any ideas?
EDIT: I'd be ok getting ALL columns back every time, but every attempt I've made to include all columns has resulted in some kind of FULL OUTER JOIN or CROSS JOIN, returning orders of magnitude more records than it should. So, I'd be open to getting all columns back, as long as I don't get the cross join.

I'm not sure of the performance with 500k records, but here is an attempted solution.
Since you are combining two subsets of DataRows from different tables, there are no easy operations that will create the subset or create a new DataTable from the subsets (though I have an extension method for flattening an IEnumerable<anon> where anon = new { DataRow1, DataRow2, ... } from a join, it would probably be slow for you).
Instead, I pre-create an answer DataTable with the columns requested and then use LINQ to build the value arrays to be added as the rows.
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
var rtnCols1 = dt1.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName)).ToList();
var rc1 = rtnCols1.Select(dc => dc.ColumnName).ToList();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.ColumnName).ToList();
var work = from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
select (from c1 in rc1 select dataRows1[c1]).Concat(from c2 in rc2 select dataRows2[c2]).ToArray();
var result = new DataTable();
foreach (var rc in rtnCols1)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rc in rtnCols2)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rowVals in work)
result.Rows.Add(rowVals);
return result;
}
Since you were using query syntax, I did as well, but normally I would probably do the select like so:
select rc1.Select(c1 => dataRows1[c1]).Concat(rc2.Select(c2 => dataRows2[c2])).ToArray();
Updated: It is probably worthwhile to use the column ordinals instead of the names to index into each DataRow by replacing the definitions of rc1 and rc2:
var rc1 = rtnCols1.Select(dc => dc.Ordinal).ToList();
var rc1Names = rtnCols1.Select(dc => dc.ColumnName).ToHashSet();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1Names.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.Ordinal).ToList();

Algorithm: two tables are in connection with ID, calculating a new table from them

I have two Excel tables for input.
The structure of the first one is like (first_table, fields: id, value, no primary key):
ID1 4
ID1 5
ID1 2
ID2 3
ID2 1
ID3 1
ID4 1
ID4 3
etc till the end of the document (it's not determined)
The second one is like (second_table, fields: id, value, ID is primary key):
ID1 2
ID2 5
ID3 1
ID4 2
etc till the end of the document (it's not determined)
I would like to create a new table (let's call it output_table) from these. The new table should contain the same fields: ID and value. In this new table I want to write each records from the first table keeping its sequence (it's very important, because it's in a timeline). The values should change according to the following conditions:
- ID's are in connection
- if the first_table.value is higher than or equals with the second_table.value, the output.value := second_table.value. And this point I would like to omit records under it with the same IDs (in the first table) and step to the next ID type.
- if the the second_table.value is higher than the first_table.value, then output_table.value := first_table.value and the reference of second_table.value := (second_table.value - first_table.value) for the next step (and if this case happens, I want to investigate the next record with the calculated reference till a new ID comes up from first_table, or the first condition will be true)
I could not figure out the proper algorithm for it, please help me! Thank you for your help!!
I'm working in C#, and I have already created list of (record) objects from the two input table. (so maybe Linq would help me)

ok.
You may create a class (you could use a Tuple<string, int> instead, but it maybe easier to understand) for first_table and output table values.
public class MyDatas {
public string Id {get;set;}
public int Value {get;set;}
}
Imagine you have, from first_table, a List of MyDatas
And for second_table a Dictionary<string, int>
which seems reasonable.
then you create an empty list of MyDatas which is your output.
var table1 = new List<MyDatas>(<content of first_table>);
var table2 = new Dictionary<string, int>(<content of second_table>);
var output = new List<MyDatas>();
//good old foreach, this may be clearer than linq in this case.
foreach (var l1 in table1) {
var id = l1.Id;
//default : we take table 1 Value (if Id is not in table2 or table 2 value > table1 value
var newData = new MyDatas{Id = id, Value = l1.Value};
output.Add(newData);
//id of table1 is not in table 2, go to next line.
if (!table2.ContainsKey(id)) continue;
//if table 1 value > = table 2 value => take table 2 value
if (l1.Value >= table2[id])
newData.Value = table2[id];
//if table 2 value => table 1 value, keep table 1 value for output and decrement table2 value
else {
table2[id] -= l1.Value;
}
}

Entity framework and Stored procedure

I have a stored procedure :
CREATE PROCEDURE SELECT_Some_Data
#Sreachstr nvarchar(200)
AS
BEGIN
SELECT ROW_NUMBER() OVER(ORDER BY [Document].DocNo DESC) AS Row,*
FROM Document WHERE DocNo=#Sreachstr
END
when I execute it with #Sreachstr='153', it returns 15 records.
I use Entity Framework to get the data returned by the stored procedure:
public static List<DocumentInfo_Full_Data> SelectByDocNo(string SearchStr)
{
using (LibEntities_new db = new LibEntities_new())
{
return SelectByDocNo(db, SearchStr);
}
}
private static List<DocumentInfo_Full_Data> SelectByDocNo(LibEntities_new db, String SearchStr)
{
return db.SelectByDocNo(SearchStr).ToList();
}
public ObjectResult<DocumentInfo_Full_Data> SelectByDocNo(global::System.String searchStr)
{
ObjectParameter searchStrParameter;
if (searchStr != null)
{
searchStrParameter = new ObjectParameter("SearchStr", searchStr);
}
else
{
searchStrParameter = new ObjectParameter("SearchStr", typeof(global::System.String));
}
return base.ExecuteFunction<DocumentInfo_Full_Data>("SelectByDocNo", searchStrParameter);
}
When I call this method with parameter SearchStr="15" , I see one record that 15 times is repeated instead of 15 different records.

I had this happen to me once when I was selecting rows from a view in EF.
Since the view itself doesn't have a primary key, EF wasn't able to determine the key - instead, EF created a "guessed" key based on all non-nullable columns from the view.
My view returned four rows of data, e.g.
Col1 Col2 Col3 Col4
1 2 'ABC' 42
1 2 'DEF' 57
1 2 'GHI' 4711
1 2 'JKL' 404
--> my query worked just fine in SQL Server Management Studio.
The "key" that EF had guessed was based on (Col1, Col2).
Now when I retrieved the rows using EF, this happen:
the first row got selected - EF saw it didn't have any data yet, so it stored that row in its result set
the next row was selected - EF determined that the key was the same ( (1,2) again) so it assumed this was the same row again; same key -> same row, so that same entity got stored a second, third and fourth time
So in the end, what I got back from EF was
Col1 Col2 Col3 Col4
1 2 'ABC' 42
1 2 'ABC' 42
1 2 'ABC' 42
1 2 'ABC' 42
because the key that determines uniqueness of an entity in EF was the same for each of the four columns from the database.
So this might be happening in your case, too - especially if your created a new complex type for your data returned from the stored procedure - and if your key on the EF entity (DocumentInfo_Full_Data) is not properly set to an actual, really identifying column (or set of columns) from the database. Check it out!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove rows from a DataTable where an entry exist in another DataTable - c#

Related

Select distinct records from database with condition using Entity Framework

how to get records from db having one common column values with another list?

LINQ join to return dynamic column list

Algorithm: two tables are in connection with ID, calculating a new table from them

Entity framework and Stored procedure

Categories

Resources