Select distinct values from a large DataTable column - c#

I have a DataTable with 22 columns and one of the columns I have is called "id". I would like to query this column and keep all the distinct values in a list. The table can have between 10 and a million rows.
What is the best method to do this? Currently I am using a for loop to go though the column and compare the values and if the values are the same then the it goes to the next and when not the same it adds the id to the array. But as the table can have 10 to a million rows is there a more efficient way to do this! How would I go about doing this more efficiently?

Method 1:
DataView view = new DataView(table);
DataTable distinctValues = view.ToTable(true, "id");
Method 2:
You will have to create a class matching your datatable column names and then you can use the following extension method to convert Datatable to List
public static List<T> ToList<T>(this DataTable table) where T : new()
{
List<PropertyInfo> properties = typeof(T).GetProperties().ToList();
List<T> result = new List<T>();
foreach (var row in table.Rows)
{
var item = CreateItemFromRow<T>((DataRow)row, properties);
result.Add(item);
}
return result;
}
private static T CreateItemFromRow<T>(DataRow row, List<PropertyInfo> properties) where T : new()
{
T item = new T();
foreach (var property in properties)
{
if (row.Table.Columns.Contains(property.Name))
{
if (row[property.Name] != DBNull.Value)
property.SetValue(item, row[property.Name], null);
}
}
return item;
}
and then you can get distinct from list using
YourList.Select(x => x.Id).Distinct();
Please note that this will return you complete Records and not just ids.

This will retrun you distinct Ids
var distinctIds = datatable.AsEnumerable()
.Select(s=> new {
id = s.Field<string>("id"),
})
.Distinct().ToList();

dt- your data table name
ColumnName- your columnname i.e id
DataView view = new DataView(dt);
DataTable distinctValues = new DataTable();
distinctValues = view.ToTable(true, ColumnName);

All credit to Rajeev Kumar's answer, but I received a list of anonymous type that evaluated to string, which was not as easy to iterate over. Updating the code as below helped to return a List that was more easy to manipulate (or, for example, drop straight into a foreach block).
var distinctIds = datatable.AsEnumerable().Select(row => row.Field<string>("id")).Distinct().ToList();

Try this:
var idColumn="id";
var list = dt.DefaultView
.ToTable(true, idColumn)
.Rows
.Cast<DataRow>()
.Select(row => row[idColumn])
.ToList();

Sorry to post answer for very old thread. my answer may help other in future.
string[] TobeDistinct = {"Name","City","State"};
DataTable dtDistinct = GetDistinctRecords(DTwithDuplicate, TobeDistinct);
//Following function will return Distinct records for Name, City and State column.
public static DataTable GetDistinctRecords(DataTable dt, string[] Columns)
{
DataTable dtUniqRecords = new DataTable();
dtUniqRecords = dt.DefaultView.ToTable(true, Columns);
return dtUniqRecords;
}

Note: Columns[0] is the column on which you want to perform the DISTINCT query and sorting
DataView view = new DataView(DT_InputDataTable);
DataTable distinctValues = new DataTable();
view = new DataView(DT_InputDataTable) { Sort = DT_InputDataTable.Columns[0].ToString() };
distinctValues = view.ToTable(true, DT_InputDataTable.Columns[0].ToString());

Related

How do I GroupBy one column on this DataTable

Suppose I have a call log DataTable where each row represents a call placed with the following columns:
AccountNumber1, AccountNumber2, AccountListDate, AccountDisposition
I want to GroupBy column AccountNumber1 and want a new DataTable with the same columns + 1 additional column NumCalls which will be the count of calls for each AccountNumber1.
New DataTable after GroupBy:
AccountNumber1, AccountNumber2, AccountListDate, AccountDisposition, NumCalls
So far I have the following:
table.AsEnumerable()
.GroupBy(x => x.Field<int>("AccountNumber1"))
.Select(x => new { x.Key.AccountNumber1, NumCalls = x.Count() })
.CopyToDataTable()
Which gives me a DataTable with just two columns AccountNumber1 and NumCalls. How do I get the other columns as I described above?? I would appreciate any help. Thank you.
There's no magic, you need to use a loop and initialize the new table with the new column:
DataTable tblResult = table.Clone();
tblResult.Columns.Add("NumCalls", typeof(int));
var query = table.AsEnumerable().GroupBy(r => r.Field<string>("AccountNumber1"));
foreach (var group in query)
{
DataRow newRow = tblResult.Rows.Add();
DataRow firstOfGroup = group.First();
newRow.SetField<string>("AccountNumber1", group.Key);
newRow.SetField<string>("AccountNumber2", firstOfGroup.Field<string>("AccountNumber2"));
newRow.SetField<DateTime>("AccountListDate", firstOfGroup.Field<DateTime>("AccountListDate"));
newRow.SetField<string>("AccountDisposition", firstOfGroup.Field<string>("AccountDisposition"));
newRow.SetField<int>("NumCalls", group.Count());
}
This takes arbitrary values from the first row of each group which seems to be desired.

How to use LINQ to get unique columns from a DataTable

I have a DataTable in C# with columns defined as follows:
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
"UserName", "UserExId", and "UserEmail" are all unique and they are grouped by "OrgName" and "OrgExId"
I want to write a LINQ query to make a new DataTable that contains unique "OrgExId's" and "OrgName's"
This is as far as I got:
var results = from row in dt.AsEnumerable()
group row by row["OrgExId"] into orgs
select orgs;
Specifically in this query, I don't understand how I am supposed to select the rows from the original DataTable. Visual Studio says orgs is of the type `IGrouping, but I have never really seen this type before and am not sure how to manipulate it.
Is this a key value pair?
Sorry about that all. I did not specify my end result.
I want to end up with a DataTable with two columns, distinct "OrgExId" and "OrgName". (There is a one to one relationship between "OrgExId" and "OrgName")
All you really need is a Distinct clause
var output = dt.AsEnumerable()
.Select(x => new {OrgExId = x["OrgExId"], OrgName = x["OrgName"]})
.Distinct();
You can then iterate over this and build a DataTable or whatever you need.
UPDATE: You asked for the output to be a DataTable and the above solution didn't quite sit well with me since it requires extra work. To make this more efficient you could do a custom equality comparer.
Your linq looks like this...
// This returns a DataTable
var output = dt.AsEnumerable()
.Distinct(new OrgExIdEqualityComparer())
.CopyToDataTable();
And your comparer looks like this...
public class OrgExIdEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["OrgExId"].Equals(y["OrgExId"]);
}
public int GetHashCode(DataRow obj)
{
return obj["OrgExId"].GetHashCode();
}
}
Use Key property of IGrouping:
var results = from row in dt.AsEnumerable()
group row by new {
row.GetField<string>("OrgExId"),
row.GetField<string>("UserName")
} into orgs
select orgs.Key;
It will give you collection of anonymous types. To get DataTable you can simply iterate over results and add it into DataTable.
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
// put some data for testing purpose
var id = Guid.NewGuid().ToString();
for (var i = 0; i < 10; i++)
dt.Rows.Add(id, i.ToString(), "user_name", Guid.NewGuid().ToString());
var x = dt.Rows.Cast<DataRow>().Select(x => x.Field<string>("UserName")).Distinct();
Console.WriteLine(x);

How do I use LINQ to filter a datatable against a Lst of strings that need to be split?

I have a datatable and I want to use LINQ to filter against a List of strings, with each string delimited using the pipe ('|'), and contains two values.
The list (List Actions) of string looks like this. This is only two strings in this list, but it can have many more.
8/1/2013 9:57:52 PM|Login for bill.lock#cap.com
8/1/2013 9:57:37 PM|Login for bill.lock#cap.com
The datatable has five (5) fields in each row, and I'm using each string from the list above to compare two fields (Text and Time) in the datatable to omit or delete those rows.
The datatable is structured like this
DataTable stdTable = new DataTable("Actions");
DataColumn col1 = new DataColumn("Area");
DataColumn col2 = new DataColumn("Action");
DataColumn col3 = new DataColumn("Time");
DataColumn col4 = new DataColumn("Text");
Currently I'm manually performing all this, but I know it can be done in LINQ with just a few lines of code. I'm not sure how to iterate thru the list and use the split. I saw this example, but the split is beyond me.
// Get all checked id's.
var ids = chkGodownlst.Items.OfType<ListItem>()
.Where(cBox => cBox.Selected)
.Select(cBox => cBox.Value)
.ToList();
// Now get all the rows that has a CountryID in the selected id's list.
var a = dt.AsEnumerable().Where(r =>
ids.Any(id => id == r.Field<int>("CountryID"))
);
// Create a new table.
DataTable newTable = a.CopyToDataTable();
Any help would be appreciated.
Thanks
List<string> list = {
"8/1/2013 9:57:52 PM|Login for bill.lock#cap.com",
"8/1/2013 9:57:37 PM|Login for bill.lock#cap.com"
};
var a = dt.AsEnumerable().Where(x=>
!list.Select(y=> new {
Time = DateTime.Parse(y.Split('|')[0]),
Text = y.Split('|')[1]
})
.Any(z=> z.Time == x.Time && z.Text == x.Text));
or
var a = dt.AsEnumerable().Where(x=>
!list.Any(y=> y == string.Format("{0}|{1}",x["Time"],x["Text"])));
DataTable newTable = a.CopyToDataTable();

how to get all columns name from multiple table in a dataset by LINQ

i have multiple table in dataset so now i have to get all there coloumns name from the datatable.
Is it possible to do by Linq though I m using for Loops for getting the column name.
LINQ without loops
var columnNames = (from table in dataSet.Tables.OfType<DataTable>()
from column in table.Columns.OfType<DataColumn>()
select column.ColumnName).ToList();
var columns = dataSet.Tables
.Cast<DataTable>()
.SelectMany(t=>t.Columns
.Cast<DataColumn>()
.Select(c=>c.ColumnName));
or if you want the table name as well
var columns = dataSet.Tables
.Cast<DataTable>()
.SelectMany(t=>t.Columns
.Cast<DataColumn>()
.Select(c=> new {
t.TableName,
c.ColumnName
}
)
);
The Cast<> is necesary to turn the non-generic IEnumerable Tables and Columns properties into IEnumerable<T> types that can be used in Linq queries.
Try this:
List<string> result = new List<string>();
foreach (DataTable item in dataSet.Tables)
{
result.AddRange(item.Columns.Cast<DataColumn>().Select(i => i.ColumnName).ToList());
}

Best Practice: Convert LINQ Query result to a DataTable without looping

What is the best practice to convert LINQ-Query result to a new DataTable?
can I find a solution better than foreach every result item?
EDIT
AnonymousType
var rslt = from eisd in empsQuery
join eng in getAllEmployees()
on eisd.EMPLOYID.Trim() equals eng.EMPLOYID.Trim()
select new
{
eisd.CompanyID,
eisd.DIRECTID,
eisd.EMPLOYID,
eisd.INACTIVE,
eisd.LEVEL,
eng.EnglishName
};
EDIT 2:
I got exception:
Local sequence cannot be used in LINQ to SQL implementation of query operators except the Contains() operator.
as I try to execute the query
and found the solution here IEnumerable.Except wont work, so what do I do? and Need linq help
Use Linq to Dataset. From the MSDN : Creating a DataTable From a Query (LINQ to DataSet)
// Query the SalesOrderHeader table for orders placed
// after August 8, 2001.
IEnumerable<DataRow> query =
from order in orders.AsEnumerable()
where order.Field<DateTime>("OrderDate") > new DateTime(2001, 8, 1)
select order;
// Create a table from the query.
DataTable boundTable = query.CopyToDataTable<DataRow>();
If you have anonymous types :
From the Coder Blog : Using Linq anonymous types and CopyDataTable
It explains how to use MSDN's How to: Implement CopyToDataTable Where the Generic Type T Is Not a DataRow
Converting Query result in DataTables Generic Function
DataTable ddt = new DataTable();
ddt = LINQResultToDataTable(query);
public DataTable LINQResultToDataTable<T>(IEnumerable<T> Linqlist)
{
DataTable dt = new DataTable();
PropertyInfo[] columns = null;
if (Linqlist == null) return dt;
foreach (T Record in Linqlist)
{
if (columns == null)
{
columns = ((Type)Record.GetType()).GetProperties();
foreach (PropertyInfo GetProperty in columns)
{
Type colType = GetProperty.PropertyType;
if ((colType.IsGenericType) && (colType.GetGenericTypeDefinition()
== typeof(Nullable<>)))
{
colType = colType.GetGenericArguments()[0];
}
dt.Columns.Add(new DataColumn(GetProperty.Name, colType));
}
}
DataRow dr = dt.NewRow();
foreach (PropertyInfo pinfo in columns)
{
dr[pinfo.Name] = pinfo.GetValue(Record, null) == null ? DBNull.Value : pinfo.GetValue
(Record, null);
}
dt.Rows.Add(dr);
}
return dt;
}
I am using morelinq.2.2.0 package in asp.net web application, Nuget package manager console
PM> Install-Package morelinq
Namespace
using MoreLinq;
My sample stored procedure sp_Profile() which returns profile details
DataTable dt = context.sp_Profile().ToDataTable();
Use System.Reflection and iterate for each record in the query object.
Dim dtResult As New DataTable
Dim t As Type = objRow.GetType
Dim pi As PropertyInfo() = t.GetProperties()
For Each p As PropertyInfo In pi
dtResult.Columns.Add(p.Name)
Next
Dim newRow = dtResult.NewRow()
For Each p As PropertyInfo In pi
newRow(p.Name) = p.GetValue(objRow,Nothing)
Next
dtResult.Rows.Add(newRow.ItemArray)
Return dtResult
Try
Datatable dt= (from rec in dr.AsEnumerable() select rec).CopyToDataTable()

Categories

Resources