I have a process that extracts customer info from multiple databases (MySql) based on a timestamp. I store this data into a DataTable. The data table represents updates to existing customer info as well as new customer info.
I want to delete any dupes in the destination database (SqlServer) based on one constant value, CompanyID, and the CustomerID. So, I thought a join would give me the RecordIDs of the dupes in the destination DB, pass the List<int> (or some collection mechanism) to the DELETE method.
What I have:
using (var context = new DataContext(SqlConnection))
{
var tblSource = context.GetTable<tblCustomerInfo>();
var dupeIDs = from currCust in tblSource
join newCust in myTable.AsEnumerable() on currCust.CompanyID equals newCust.Field<string>("CompanyID")
where currCust.CustomerID.Equals(newCust.Field<int>("CustomerID")
select currCust.RecordID;
}
This obviously does not work. I will update with the exact error messages in a bit, but this doesn't compile.
First, is my join syntax even correct for what I am wanting to achieve?
Second, how can I write this Linq to join between a DataTable and the destination SqlServer database?
Afterthought - is it possible to, once I have a collection of dupe RecordIDs, use Linq to DELETE records from the destination database?
Edit
To clarify the process, I have incoming data tables like so and contained in a DataSet:
Table1
CompanyID CustomerID Field1 Field2 ....
1 5 ... ...
1 15 ... ...
Table2
CompanyID CustomerID Field1 Field2 ....
10 125 ... ...
10 145 ... ...
Which will all go into a single database:
Destination DB
CompanyID CustomerID Field1 Field2 ....
1 5 ... ...
1 15 ... ...
1 27 ... ...
5 15 ... ...
10 125 ... ...
10 145 ... ...
11 100 ... ...
So, in this case I would delete from the destination table the items that match from tables 1 & 2. The destination database will be growing constantly so creating a List of CustomerID does not seem feasible. However, I expect daily imports of new and updated customer info to be relatively small (in the hundreds, maybe near 1000 records).
If I cannot write a single join what other method for completing this process would be appropriate? I am trying to figure something out since it looks like I cannot actually mix Linq-to-Sql and Linq-to-Objects.
Is it possible to somehow map my data table to the entity datamap, tbl_CustomerInfo, filling an otherwise immutable var, then perform the join?
Update
Here is what I have accomplished at this point and I get the results I expect from dupes:
using (DataContext context = new DataContext(SqlConnection)
{
var custInfo = context.GetTable<tbl_CustomerInfo>();
string compID = ImportCust.Rows[0]["CompanyID"].ToString();
var imports = from cust in ImportCust.AsEnumerable()
select cust.Field<int>("CustomerID");
var dupes = from cust in custInfo
join import in imports
on cust.CustomerID equals import
where cust.CompanyID == compID
select cust;
custInfo.DeleteOnSubmit(/* what goes here */);
context.SubmitChanges();
}
My question now is, what goes into the DeleteOnSubmit(...)? I feel like I have gotten so close only to be foiled by this.
I usually tackle all of this in a stored proc for efficiency.
Add an identity field to your destination table to uniquely identify the records, then use a query like this:
DELETE d
FROM DestinationTable d JOIN (
Select CompanyID, CustomerID, Min(UniqueID) AS FirstRecID
FROM DestinationTable
GROUP BY CompanyID, CustomerID) u on u.CompanyID=d.CompanyID AND u.CustomerID=d.CustomerID
WHERE d.UniqueID <> u.FirstRecID
Alternatively, you could create two lists of List<int>, containing id's from your two sources, then use the Intersect LINQ operator to find the common items.
List<int> a = new List<int>{1,2,3,4,5,6,8, 10};
List<int> b = new List<int>{1,2,99,5,6,8, 10};
var c= a.Intersect(b); //returns the items common to both lists
Here is what I have that works:
using (DataContext context = new DataContext(SqlConnection)
{
var custInfo = context.GetTable<tbl_CustomerInfo>();
string compID = ImportCust.Rows[0]["CompanyID"].ToString();
var imports = from cust in ImportCust.AsEnumerable()
select cust.Field<int>("CustomerID");
var dupes = from import in imports
join cust in custInfo
on import equals cust.CustomerID
where cust.CompanyID== pivnum
select cust;
var records = dupes.GetEnumerator();
while (records.MoveNext())
{ custInfo.DeleteOnSubmit(records.Current); }
context.SubmitChanges();
}
If there is a more efficient method, I'm interested in options.
Related
Lets suppose I have a table called Transactions
Transactions has the following columns
OrderId,
OrderType (Can be 0 = Sale or 1 = Purchase) <--- this can increase
Amount
Now I want to get the relevant data based on the OrderType
if OrderType = 0 then join from Sale Table else Join from Purchase Table.
Currently what I am doing is that doing three calls to the database to get the some other values from the other tables(which works but highly inefficient in long run as 3 Calls are bad performance wise).
My solution is using left join with SQL
SELECT ap.*,
coalesce(s.orderNo,p.orderNo) as orderNo
FROM apptransactions AS ap
LEFT JOIN sales AS s ON (ap.orderType = 0 and ap.orderId = s.id)
LEFT JOIN purchases AS p ON (ap.orderType = 1 and ap.orderId = p.id);
how can this query be converted to EF Core?
Why is your query not like this?
var results = context.Transactions.Select(t =>
new
{
/* t.column list, there's no t.* in LINQ */,
OrderNo = t.OrderType == 0 ? t.Sale.OrderNo : t.Purchase.OrderNo
});
Let EF generate any underlying joins it needs to, concern yourself with getting the results you want.
This also alludes to what #caius mentions. Your model is likely not high level enough or incorrectly mapped.
I'm looking for a way to return a dynamic column list from a LINQ join of two datatables.
First, this is not a duplicate. I have already studied and discarded:
C# LINQ list select columns dynamically from a joined dataset
Creating a LINQ select from multiple tables
How to do a LINQ join that behaves exactly like a physical database inner join?
(and many others)
Here is my starting point:
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
DataTable result = ( from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
[...I NEED HELP HERE with the SELECT....]).CopyToDataTable();
return result;
}
A few notes and requirements:
There is no database engine. The data sources are large CSV files (500K+ records) being read into c# DataTables.
Because the CSVs are large, looping through each record in the join is a bad solution for performance reasons. I've already tried record looping and it's just too slow. I get great performance on the join above, but I can't find a way to have it return just the columns I want (specified by the caller) without looping records.
If I need to loop over columns in the join, that is perfectly fine, I just don't want to loop rows.
I want to be able to pass in an array of column names and return just those columns in the resulting DataTable. If both datatables being passed in happen to have a column named the same, and if that column is in my array of column names, just pass back either column because the data will be the same between the 2 columns in that case.
If I need to pass in 2 arrays (1 for each datatable's desired columns) that's fine, but 1 array of column names would be ideal.
The column list cannot be static and hardcoded into the function. The reason is because my JoinDataTables() is called from many different places in my system in order to join a wide variety of CSVs-turned-datatables, and each CSV file has very different columns.
I don't want all columns returned in the resulting DataTable -- just the columns I specify in the columns array.
So suppose, before calling JoinDataTables(), I have the following 2 datatables:
Table: T1
T1A T1B T1C T1D
==================
10 AA H1 Foo1
11 AB H1 Foo2
12 AA H2 Foo1
13 AB H2 Foo2
Table: T2
T2A T2X T2Y T2Z
==================
12 N1 O1 Yeah1
17 N2 O2 Yeah2
18 N3 O1 Yeah1
19 N4 O2 Yeah2
Now suppose we join these 2 tables like so:
ON T1.T1A = T2.T2A
select * from [join]
and that yields this resultset:
T1A T1B T1C T1D T2A T2X T2Y T2Z
====================================
12 AA H2 Foo1 12 N1 O1 Yeah1
Notice that only 1 row is yielded by the join.
Now to the crux of my question. Suppose that for a given use case, I want to return only 4 columns from this join: T1A, T1D, T2A, and T2Y. So my resultset would then look like this:
T1A T1D T2A T2Y
==================
12 Foo1 12 O1
I'd like to be able to call my JoinDataTables function like so:
DataTable dt = JoinDataTables(dt1, dt2, "T1A", "T2A", new string[] {"T1A", "T1D", "T2A", "T2Y"});
Keeping in mind performance and the fact that I don't want to loop through records (because it's slow for large sets of data), how can this be accomplished? (The join is already working well, now I just need a correct select segment (whether via new{..} or whatever you think)).
I cannot accept a solution with a hardcoded column list inside the function. I have found examples of that approach all over SO.
Any ideas?
EDIT: I'd be ok getting ALL columns back every time, but every attempt I've made to include all columns has resulted in some kind of FULL OUTER JOIN or CROSS JOIN, returning orders of magnitude more records than it should. So, I'd be open to getting all columns back, as long as I don't get the cross join.
I'm not sure of the performance with 500k records, but here is an attempted solution.
Since you are combining two subsets of DataRows from different tables, there are no easy operations that will create the subset or create a new DataTable from the subsets (though I have an extension method for flattening an IEnumerable<anon> where anon = new { DataRow1, DataRow2, ... } from a join, it would probably be slow for you).
Instead, I pre-create an answer DataTable with the columns requested and then use LINQ to build the value arrays to be added as the rows.
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
var rtnCols1 = dt1.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName)).ToList();
var rc1 = rtnCols1.Select(dc => dc.ColumnName).ToList();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.ColumnName).ToList();
var work = from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
select (from c1 in rc1 select dataRows1[c1]).Concat(from c2 in rc2 select dataRows2[c2]).ToArray();
var result = new DataTable();
foreach (var rc in rtnCols1)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rc in rtnCols2)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rowVals in work)
result.Rows.Add(rowVals);
return result;
}
Since you were using query syntax, I did as well, but normally I would probably do the select like so:
select rc1.Select(c1 => dataRows1[c1]).Concat(rc2.Select(c2 => dataRows2[c2])).ToArray();
Updated: It is probably worthwhile to use the column ordinals instead of the names to index into each DataRow by replacing the definitions of rc1 and rc2:
var rc1 = rtnCols1.Select(dc => dc.Ordinal).ToList();
var rc1Names = rtnCols1.Select(dc => dc.ColumnName).ToHashSet();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1Names.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.Ordinal).ToList();
I have some Ids store in below variable:
List<int> Ids;
Now I want to get records based on above Ids but with same order as it is in above Ids.
For eg: Records are like this in database:
Employee:
Id
1
2
3
4
5
Now if Ids array holds Ids like this : 4,2,5,3,1 then I am trying to get records in this order order only:
Query:
var data = context.Employee.Where(t => Ids.Contains(t.Id)).ToList();
But above query is giving me output like it is in table:
Id
1
2
3
4
5
Expected output :
Id
4
2
5
3
1
Update:I have already tried this below solution but as this is entity framework it didn't work out:
var data = context.Employee.Where(t => Ids.Contains(t.Id))
.OrderBy(d => Ids.IndexOf(d.Id)).ToList();
For above solution to make it working I have to add to list :
var data = context.Employee.Where(t => Ids.Contains(t.Id)).ToList()
.OrderBy(d => Ids.IndexOf(d.Id)).ToList();
But I don't want to load data in memory and then filter out my record.
Since the order in which the data is returned when you do not specify an ORDER BY is not determined, you have to add an ORDER BY to indicate how you want it sorted. Unfortunately you have to order based on objects/values in-memory, and cannot use that to order in your SQL query.
Therefore, the best you can do is to order in-memory once the data is retrieved from the database.
var data = context.Employee
// Add a criteria that we only want the known ids
.Where(t => Ids.Contains(t.Id))
// Anything after this is done in-memory instead of by the database
.AsEnumerable()
// Sort the results, in-memory
.OrderBy(d => Ids.IndexOf(d.Id))
// Materialize into a list
.ToList();
Without stored procedures you can use Union and ?: that are both canonical functions.
I can't immagine other ways.
?:
You can use it to assign a weigth to each id value then order by the weigth. Also, you have to generate ?: using dynamic linq.
What is the equivalent of "CASE WHEN THEN" (T-SQL) with Entity Framework?
Dynamically generate LINQ queries
Union
I think this is the more simple way to obtain it. In this case you can add a Where/Union for each Id.
EDIT 1
About using Union you can use code similar to this
IQueryable<Foo> query = context.Foos.AsQueryable();
List<int> Ids = new List<int>();
Ids.AddRange(new[] {3,2,1});
bool first = true;
foreach (int id in Ids)
{
if (first)
{
query = query.Where(_ => _.FooId == id);
first = false;
}
else
{
query = query.Union(context.Foos.Where(_ => _.FooId == id));
}
}
var results = query.ToList();
This generate the followiong query
SELECT
[Distinct2].[C1] AS [C1]
FROM ( SELECT DISTINCT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[Distinct1].[C1] AS [C1]
FROM ( SELECT DISTINCT
[UnionAll1].[FooId] AS [C1]
FROM (SELECT
[Extent1].[FooId] AS [FooId]
FROM [Foos] AS [Extent1]
WHERE [Extent1].[FooId] = #p__linq__0
UNION ALL
SELECT
[Extent2].[FooId] AS [FooId]
FROM [Foos] AS [Extent2]
WHERE [Extent2].[FooId] = #p__linq__1) AS [UnionAll1]
) AS [Distinct1]
UNION ALL
SELECT
[Extent3].[FooId] AS [FooId]
FROM [Foos] AS [Extent3]
WHERE [Extent3].[FooId] = #p__linq__2) AS [UnionAll2]
) AS [Distinct2]
p__linq__0 = 3
p__linq__1 = 2
p__linq__2 = 1
EDIT 2
I think the best approach is in memory approach because it has the same network load, EF does not generate the ugly query that could not work on databases different from SQL Server and code is more readable. In your particular application could be that union/where is better. So, generally I would suggest you to try memory approach then, if you have [performance] issues, you can check if union/where is better.
I have a situation where I need data from multiple database tables.
Table 1 - has list of columns which needs to be displayed on front end html, angular kendo grid - which is configurable from separate Admin configuration.
Table 2 (joining of some other tables)- has the data which needs to be displayed on the angular front end.
My linq here which I am using currently is as below.
Query 1: to get list of columns to be displayed on Grid
var columns = from cols in _context.columns
select cols.colNames;
Query 2: Get the actual data for list
var data = from cust in _context.customer
join details in _context.custDetails on cust.id equals details.custid
join o in _context.orders on cust.id equals o.custid
where cust.id == XXXX
select new Customer
{
Id = cust.Id,
Name = cust.Name,
Address = details.Address,
City = details.City,
State = details.State,
OrderDate = o.OrderDate,
Amount = o.Amount
//15 other properties similarly
};
returns IQueryable type to Kendo DataSourceRequest
Currently, From my ui I have been make two api calls one for columns and one for getting the actual data, and show/hide the columns which are configured in the columns table.
But the problem is if anyone looks at the api calls on the network or on browser tools they could see the data being returned for the columns that are to be hidden which is a security problem.
I am looking for a single query for my api which returns the data using second query which should be smart enough to send the data only for configured columns (there could be 30 different columns) and set the other properties to null or doesn't select them at all. there are some properties which needs to be returned always as they are being used for some other purpose.
I searched many resources on how could I generate dynamic linq select using the configured columns.
Please some one help me in resolving this problem
you can do something like this. Assuming you columns tables a Boolean column Display and when it is true Column will be displayed and when it is false it wont be displayed.
var columns = (from cols in _context.columns
select cols).ToList(); // note I am getting everything not just column names here...
var data = from cust in _context.customer
join details in _context.custDetails on cust.id equals details.custid
join o in _context.orders on cust.id equals o.custid
where cust.id == XXXX
select new Customer
{
Id = cust.Id,
Name = cust.Name,
Address = details.Address,
City = details.City,
State = details.State,
OrderDate = o.OrderDate,
Amount = o.Amount
//15 other properties similarly
}.ToList();
var fileterdData = from d in data
select new Customer
{
Id = DisplayColumn("ID",columns)? cust.Id:null,
Name = DisplayColumn("Name",columns)? cust.Name:null,
Address = DisplayColumn("Address",columns)? details.Address:null,
// do similarly for all other columns
}.AsQueryable(); // returns IQueryable<Customer>
private bool DisplayColumnn(string columnName,List<Columns> cols)
{
return cols.Where(x=>x.ColumnName==columnName).First().Display();
}
So now you will have this code as part of one web API call which is going to do two SQL calls one to get columns and other to get data then you will use Linq To Entity filter columns which you dont want ( or want them to null). return this filtered data back to UI.
I have two Tables - tblExpenses and tblCategories as follows
tblExpenses
ID (PK),
Place,
DateSpent,
CategoryID (FK)
tblCategory
ID (PK),
Name
I tried various LINQ approaches to get all distinct records from the above two tables but not with much success. I tried using UNION and DISTINCT but it didnt work.
The above two tables are defined in my Model section of my project which in turn will create tables in SQLite. I need to retrieve all the distinct records from both the tables to display values in gridview.
Kindly provide me some inputs to accomplish this task. I did some research to find answer to this question but nothing seemed close to what I wanted. Excuse me if I duplicated this question.
Here is the UNION, DISTINCT approaches I tried:
DISTINCT # ==> Gives me Repetitive values
(from exp in db.Table<tblExpenses >()
from cat in db.Table<tblCategory>()
select new { exp.Id, exp.CategoryId, exp.DateSpent, exp.Expense, exp.Place, cat.Name }).Distinct();
UNION # ==> Got an error while using UNION
I think union already does the distict when you join the two tables you can try somethin like
var query=(from c in db.tblExpenses select c).Concat(from c in
db.tblCategory select c).Distinct().ToList();
You will always get DISTINCT records, since you are selecting the tblExpenses.ID too. (Unless there are multiple categories with the same ID. But that of course would be really, really bad design.)
Remember, when making a JOIN in LINQ, both field names and data types should be the same. Is the field tblExpenses.CategoryID a nullable field?
If so, try this JOIN:
db.Table<tblExpenses>()
.Join(db.Table<tblCategory>(),
exp => new { exp.CategoryId },
cat => new { CategoryId = (int?)cat.ID },
(exp, cat) => new {
exp.Id,
exp.CategoryId,
exp.DateSpent,
exp.Expense,
exp.Place,
cat.Name
})
.Select(j => new {
j.Id,
j.CategoryId,
j.DateSpent,
j.Expense,
j.Place,
j.Name
});
You can try this queries:
A SELECT DISTINCT query like this:
SELECT DISTINCT Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;
limits the results to unique values in the output field. The query results are not updateable.
or
A SELECT DISTINCTROW query like this:
SELECT DISTINCTROW Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;<br/><br/>
looks at the entire underlying tables, not just the output fields, to find unique rows.
reference:http://www.fmsinc.com/microsoftaccess/query/distinct_vs_distinctrow/unique_values_records.asp