Suppose I have a table with 5 columns with ( col_1, col_2, col_3, col_4, col_5).
I am trying to use a linq query to group by 2 columns ( col_2, col_3) and i have to select first record ordered by 4th column( col_4 ) and have to get entire row having all 5 column for all records I will get.
from abcd in Context.table
group abcd by new { abcd.col2, abcd.col2 }
into temp orderby temp.orderby(x=> x.col_4)
select ..
Here I am confused how to get that entire row value,
Also I am not sure if this ordering logic will work as I want
I am using EntityFramework, and I already have an entity type for that row,
and I have created an object which is list of that class.
So it would be best if I can directly fetch the result in that object.
Each group is a collection of the original items, so you can use Linq methods like OrderBy and First on each group:
from abcd in Context.table
group abcd by new { abcd.col2, abcd.col3 }
into temp
select temp.OrderBy(x=> x.col_4).First()
Related
I'm looking for a way to return a dynamic column list from a LINQ join of two datatables.
First, this is not a duplicate. I have already studied and discarded:
C# LINQ list select columns dynamically from a joined dataset
Creating a LINQ select from multiple tables
How to do a LINQ join that behaves exactly like a physical database inner join?
(and many others)
Here is my starting point:
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
DataTable result = ( from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
[...I NEED HELP HERE with the SELECT....]).CopyToDataTable();
return result;
}
A few notes and requirements:
There is no database engine. The data sources are large CSV files (500K+ records) being read into c# DataTables.
Because the CSVs are large, looping through each record in the join is a bad solution for performance reasons. I've already tried record looping and it's just too slow. I get great performance on the join above, but I can't find a way to have it return just the columns I want (specified by the caller) without looping records.
If I need to loop over columns in the join, that is perfectly fine, I just don't want to loop rows.
I want to be able to pass in an array of column names and return just those columns in the resulting DataTable. If both datatables being passed in happen to have a column named the same, and if that column is in my array of column names, just pass back either column because the data will be the same between the 2 columns in that case.
If I need to pass in 2 arrays (1 for each datatable's desired columns) that's fine, but 1 array of column names would be ideal.
The column list cannot be static and hardcoded into the function. The reason is because my JoinDataTables() is called from many different places in my system in order to join a wide variety of CSVs-turned-datatables, and each CSV file has very different columns.
I don't want all columns returned in the resulting DataTable -- just the columns I specify in the columns array.
So suppose, before calling JoinDataTables(), I have the following 2 datatables:
Table: T1
T1A T1B T1C T1D
==================
10 AA H1 Foo1
11 AB H1 Foo2
12 AA H2 Foo1
13 AB H2 Foo2
Table: T2
T2A T2X T2Y T2Z
==================
12 N1 O1 Yeah1
17 N2 O2 Yeah2
18 N3 O1 Yeah1
19 N4 O2 Yeah2
Now suppose we join these 2 tables like so:
ON T1.T1A = T2.T2A
select * from [join]
and that yields this resultset:
T1A T1B T1C T1D T2A T2X T2Y T2Z
====================================
12 AA H2 Foo1 12 N1 O1 Yeah1
Notice that only 1 row is yielded by the join.
Now to the crux of my question. Suppose that for a given use case, I want to return only 4 columns from this join: T1A, T1D, T2A, and T2Y. So my resultset would then look like this:
T1A T1D T2A T2Y
==================
12 Foo1 12 O1
I'd like to be able to call my JoinDataTables function like so:
DataTable dt = JoinDataTables(dt1, dt2, "T1A", "T2A", new string[] {"T1A", "T1D", "T2A", "T2Y"});
Keeping in mind performance and the fact that I don't want to loop through records (because it's slow for large sets of data), how can this be accomplished? (The join is already working well, now I just need a correct select segment (whether via new{..} or whatever you think)).
I cannot accept a solution with a hardcoded column list inside the function. I have found examples of that approach all over SO.
Any ideas?
EDIT: I'd be ok getting ALL columns back every time, but every attempt I've made to include all columns has resulted in some kind of FULL OUTER JOIN or CROSS JOIN, returning orders of magnitude more records than it should. So, I'd be open to getting all columns back, as long as I don't get the cross join.
I'm not sure of the performance with 500k records, but here is an attempted solution.
Since you are combining two subsets of DataRows from different tables, there are no easy operations that will create the subset or create a new DataTable from the subsets (though I have an extension method for flattening an IEnumerable<anon> where anon = new { DataRow1, DataRow2, ... } from a join, it would probably be slow for you).
Instead, I pre-create an answer DataTable with the columns requested and then use LINQ to build the value arrays to be added as the rows.
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
var rtnCols1 = dt1.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName)).ToList();
var rc1 = rtnCols1.Select(dc => dc.ColumnName).ToList();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.ColumnName).ToList();
var work = from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
select (from c1 in rc1 select dataRows1[c1]).Concat(from c2 in rc2 select dataRows2[c2]).ToArray();
var result = new DataTable();
foreach (var rc in rtnCols1)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rc in rtnCols2)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rowVals in work)
result.Rows.Add(rowVals);
return result;
}
Since you were using query syntax, I did as well, but normally I would probably do the select like so:
select rc1.Select(c1 => dataRows1[c1]).Concat(rc2.Select(c2 => dataRows2[c2])).ToArray();
Updated: It is probably worthwhile to use the column ordinals instead of the names to index into each DataRow by replacing the definitions of rc1 and rc2:
var rc1 = rtnCols1.Select(dc => dc.Ordinal).ToList();
var rc1Names = rtnCols1.Select(dc => dc.ColumnName).ToHashSet();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1Names.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.Ordinal).ToList();
I have some sql tables that I need to query information from my current query that returns a single column list is:
from f in FactSales
where f.DateKey == 20130921
where f.CompanyID <= 1
join item in DimMenuItems
on f.MenuItemKey equals item.MenuItemKey
join dmi in DimMenuItemDepts
on item.MenuItemDeptKey equals dmi.MenuItemDeptKey
group f by dmi.MenuItemDeptKey into c
select new {
Amount = c.Sum(l=>l.Amount)
}
This returns the data I want and it groups correctly by the third table I join but I cannot get the Description column from the dmi table. I have tried to add the field
Description = dmi.Description
but it doesnt work. How can I get data from the third table into the new select that I am creating with this statement? Many thanks for any help.
Firstly you are using Entity Framework COMPLETELY WRONG. Linq is NOT SQL.
You shouldn't be using join. Instead you should be using Associations.
So instead, your query should look like...
from sale in FactSales
where sale.DateKey == 20130921
where sale.CompanyID <= 1
group sale by sale.Item.Department into c
select new
{
Amount = c.Sum(l => l.Amount)
Department = c.Key
}
By following Associations, you will automatically be implicitly joining.
You should not be grouping by the id of the "table" but by the actual "row", or in Object parlance (which is what you should be using in EF, since the raison d'etre of an ORM is to convert DB to Object), is that you should be grouping by the "entity" rather than they the "entity's key".
EF already knows that the key is unique to the entity.
The grouping key word only allows you to access sale and sale.Item.Department after it. It is a transform, rather than an operator like in SQL.
I have query like below , I tried to filter out duplicate columns by using Group BY
SELECT contacts.rowid AS ROW_PASS,
duty_rota.rowid AS ROW_PASS_ROTA,
duty_rota.duty_type AS DUTY_TYPE
FROM duty_rota,
duty_types,
contacts
WHERE duty_rota.duty_type = duty_types.duty_type
AND duty_rota.duty_officer = contacts.duty_id
AND sname IS NOT NULL
GROUP BY contacts.rowid,
duty_rota.rowid,
duty_rota.duty_type
ORDER BY duty_date
After playing with the query little bit I came to know we can't filter out distinct using group by while using ROWID. So can somebody please help me to write code (in SQL) with a logic that
if (any row is completely identical with another row of the query o/p)
{
then display only one column
}
I will be using the output as gridview's data source in C#, so if not in SQL - can you help me whether somehow in C# I can achieve to display only identical columns?
If you want to filter duplicate rows, you can use this query:
SELECT Max(duty_rota.rowid) AS ROW_PASS_ROTA,
duty_rota.duty_type AS DUTY_TYPE
FROM duty_rota,
duty_types,
contacts
WHERE duty_rota.duty_type = duty_types.duty_type
AND duty_rota.duty_officer = contacts.duty_id
AND sname IS NOT NULL
GROUP BY duty_rota.duty_type
ORDER BY DUTY_TYPE
Here you go: http://sqlfiddle.com/#!2/2a038/2
Take out the ROWID's. Example: If your table has 3 columns (colA, colB, colC) you could find exact row dups this way...
select a.* from
(
select count(*) dupCnt, colA, colB, colC from myTable
group by colA, colB, colC
) a
where dupCnt > 1
First, the ROWID is a unique field for each row, so using this field you will never have duplicates. The only solution here is to not use it. It's data does not hold anything you would want to display anyway.
Simply put, if you want no duplicates, you need the DISTINCT keyword:
SELECT DISTINCT field1,
field2
FROM table1,
table2
WHERE table1.key1 = table2.key1;
This will select all Field1, Field2 combinations from the two tables. Due to the DISTINCT keyword, each line will only be in the result list once. Duplicates will not be in the result list.
SELECT DISTINCT duty_rota.duty_type AS DUTY_TYPE
FROM duty_rota,
duty_types,
contacts
WHERE duty_rota.duty_type = duty_types.duty_type
AND duty_rota.duty_officer = contacts.duty_id
AND sname IS NOT NULL
ORDER BY duty_date
You will only need to GROUP BY if you need further operations on the result set, like counting the duplicates. If all you need is "no duplicates", the DISTINCT keyword is exactly what you are looking for.
Edit:
In case I misread your question and you want to see only those, that are duplicates, you need to group and you need to filter based on the groups criteria. You can do that using the HAVING clause. It's kind of an additional WHERE of the groups criteria:
SELECT FIELD1, FIELD2, COUNT(*)
FROM TABLE1, TABLE2
WHERE TABLE1.KEY1 = TABLE2.KEY1
GROUPB BY FIELD1, FIELD2
HAVING COUNT(*) > 1
i am trying this query to get all the city's
var queryAllCustomers = from cust in loadedCustomData.Descendants("record")
select (string)cust.Element("City") ;
so it returns all city's including repeated, but i only want to get distinct city i.e to repeat only ones so how to achieve that?
Use Distinct Extension Method
var queryAllCustomers = (from cust in loadedCustomData.Descendants("record")
select (string)cust.Element("City")).Distinct();
I want filter the data in a data table using linq.
My scenario is I have an array of elements which contains dates created dynamically and in the data table we have columns as id,date,etc.
We have to retrieve the id's which contains all the dates in array
ex:
string[] arr={"10/10/2012","11/11/2012","9/9/2012"}
Table :
ID date
1 10/10/2012
2 11/11/2012
1 9/9/2012
6 9/9/2012
3 9/9/2012
6 11/11/2012
1 11/11/2012
Output would be 1 - because only id '1' has all the array elements.
To accomplish above functionality I am using the Linq query shown below. But I am literally failing.
Dim volunteers As DataTable =
(From leftTable In dtavailableVolunteers.AsEnumerable()
Join rightTable In dtavailableVolunteers.AsEnumerable()
On leftTable.VolunteerId Equals rightTable.VolunteerId
Where SelectedDatesArray.All(Function(i) rightTable.Field(Of String)("SelectedDate").Equals(i.ToString()))
Select rightTable).CopyToDataTable()
Lets say your datatable is dt
DataRow[] dr = dt.Select("date in (" + string.join("," , arr) + ")");
string[] st = dr.Select(ss => ss["id"].ToString()).ToArray();
OR
DataTable newdt = dr.CopyToDataTable();
Second line is of LINQ
You could group the rows by ID, and then find the groups where: there does not exist an arr element which the group's dates doesn't contain that element. I mean something like:
var result = from item in list
group item by item.ID into grouping
where !arr.Exists(date =>
!grouping.Select(x => x.Date).Contains(date))
select grouping.Key;
Here is another version:
from volunteer in dtavailableVolunteers
group volunteer by volunteer.Id into g
let volunteerDates = g.Select(groupedElement=>groupedElement.date)
where arr.All(date=>volunteerDates.Contains(date))
select g.Key