I want to select those gesellschaft_id's who have duplicates. I used the code below. this is selecting distinct gesellschaft_id. How to write the select expression to select that row, which rows, gesellschaft_id have more values in the datatable?
foreach (DataRow dr1 in table1.Rows)
{
DataRow[] drDup = table2.Select("('" + dr1[0].ToString() + "' = gesellschaft_id ) AND Count(gesellschaft_id)>1");
}
This will give you the DataRows which have a gesellschaft_id which exists in more than one row:
var rowsWithADuplicateGesellschaftId = table1.Rows
.Cast<DataRow>()
.GroupBy(row => row["gesellschaft_id"])
.Where(group => group.Count() > 1)
.ToArray();
public ArrayList FindDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
return duplicateList;
}
Also useful duplicate find and add/remove links:
http://www.dotnetspider.com/resources/4535-Remove-duplicate-records-from-table.aspx
http://www.dotnetspark.com/kb/94-remove-duplicate-rows-value-from-datatable.aspx
You could do something like this, assuming that the first column is the one which you want to check for duplicates. This of course assumes that you have Linq available.
var duplicateIds = table2.AsEnumerable()
.GroupBy(row = row[0])
.Where(x => x.Count() > 1);
If you add a bit more detail to the question, that would be helpful.
Related
I have a DataTable as shown below:
After using below LINQ Expression on above DT:
if (dt.AsEnumerable().All(row => string.IsNullOrEmpty(row.Field<string>("SameReferences"))))
BindOldReferences(dt);
else
{
var grps = from row in dt.AsEnumerable()
let RefID = row.Field<string>("ReferenceID")
let RefDescription = row.Field<string>("ReferenceDescription")
let ReferenceUrl = row.Field<string>("ReferenceUrl")
let SortOrder = row.Field<int>("sortOrder")
group row by new { RefDescription, ReferenceUrl, SortOrder } into groups
select groups;
dt = grps.Select(g =>
{
DataRow first = g.First();
if (first.Field<string>("SameReferences") != null)
{
string duplicate = first.Field<int>("SortOrder").ToString();
first.SetField("SameReferences", string.Format("{0},{1}", duplicate, first.Field<string>("SameReferences")));
}
return first;
}).CopyToDataTable();
}
After applying above LINQ to DT it becomes :
Expected DT as below : eliminate (,) comma when there is single value in column Samereferences. So what changes i have to make to LINQ to get the expected below output.
Please help..!
You can use String.Trim method like this:-
first.SetField("SameReferences", string.Format("{0},{1}", duplicate,
first.Field<string>("SameReferences")).Trim(','));
It will remove all the trailing comma.
Try this:
if (first.Field<string>("SameReferences") != null)
{
string duplicate = first.Field<int>("SortOrder").ToString();
string sameReference = first.Field<string>("SameReferences");
if (String.IsNullOrEmpty(sameReference))
first.SetField("SameReferences", duplicate);
else
first.SetField("SameReferences", string.Format("{0},{1}", duplicate, sameReference));
}
I'm creating Datatables from .csv files. This part actually works. My current issue is the following one:
I have to compare two or more Datatable's with the same structure. So
Datatable1:
KeyColumn, ValueColumn
KeyA, ValueA
KeyB, ValueB
KeyC, ValueC
Datatable2:
KeyColumn, ValueColumn
KeyB, ValueB
KeyC, ValueC
KeyD, ValueD
And this should end up like this:
ResultDatatable:
KeyColumn, ValueColumn (of DT1), ValueColumn (of DT2)
KeyA, ValueA
KeyB, ValueB (of DT1), ValueB (of DT2)
KeyC, ValueC (of DT1), ValueC (of DT2)
KeyD, ValueD
I can't even manage to insert the Data of the first Datatable because of different ColumnNames. Another problem is, that the Datatables own the same ColumnNames, so I can't add those to the ResultDatatable.
I have tried many ways and end up with no solution. Any ideas how to address this problem?
Edit:
The solution with Dictionaries was too sophisticated, so I continued trying to solve it with the Datatables. The source of the problem was something very unexpected.
The attempt to rename a column name to something, which contains a simple dot ('.') results with losing all data in that column.
e.g. If you have Datatable dt:
PrimaryColumn, ValueColumn
KeyA1, KeyB1
KeyA2, KeyB2
After dt.Columns[ValueColumn].ColumnName = "Value.Column"; You will lose any data in that column. I will ask MS, if this is desired or if it is a Bug in the .NET-Framework. Here is my final Code (C#). I have List<string>keys which will remain in the resultTable. and List<string>values which will be added for every Table that should be compared.
private DataTable CompareTables(List<AnalyseFile> files, Query query, List<string> keys, List<string> values) {
// Add first table completely to resultTable
DataTable resultTable =
files[0].GetDataTable(false, query.Header, query.Startstring, query.Endstring, query.Key).Copy();
foreach (string value in values) {
resultTable.Columns[value].ColumnName = "(" + files[0].getFileNameWithoutExtension() + ") " + value;
}
// Set primary keys
resultTable.PrimaryKey = keys.Select(key => resultTable.Columns[key]).ToArray();
// process remaining tables
for (int i = 1; i < files.Count; i++) {
DataTable currentTable = files[i].GetDataTable(false, query.Header, query.Startstring, query.Endstring, query.Key);
// Add value-columns to the resultTable
foreach (string value in values) {
resultTable.Columns.Add("(" + files[i].getFileNameWithoutExtension() + ") " + value);
}
// Set again primary keys
currentTable.PrimaryKey = keys.Select(key => currentTable.Columns[key]).ToArray();
// populate common Rows
foreach (DataRow dataRow in resultTable.Rows) {
foreach (DataRow row in currentTable.Rows) {
foreach (string key in keys) {
if (dataRow[key].ToString().Equals(row[key].ToString())) {
foreach (string value in values) {
string colname = "(" + files[i].getFileNameWithoutExtension() + ") " + value;
dataRow[colname] = row[value];
}
}
}
}
}
// Get all Rows, which do not exist in resultTable yet
IEnumerable<string> isNotinDT =
currentTable.AsEnumerable()
.Select(row => row.Field<string>(keys[0]))
.Except(resultTable.AsEnumerable().Select(row => row.Field<string>(keys[0])));
// Add all the non existing rows to resulTable
foreach (string row in isNotinDT) {
DataRow currentRow = currentTable.Rows.Find(row);
DataRow dRow = resultTable.NewRow();
foreach (string key in keys) {
dRow[key] = currentRow[key];
}
foreach (string value in values) {
dRow["(" + files[i].getFileNameWithoutExtension() + ") " + value] = currentRow[value];
}
resultTable.Rows.Add(dRow);
}
}
return resultTable;
}
Any improvements are Welcome!
Ok Here is an example of my version using the dictionaries.
Fiddle: http://dotnetfiddle.net/AljK9J
//Setup Sample Data
var data1 = new Dictionary<string, string>();
data1.Add("KeyA", "ValueA");
data1.Add("KeyB", "ValueB");
data1.Add("KeyC", "ValueC");
var data2 = new Dictionary<string, string>();
data2.Add("KeyB", "ValueB");
data2.Add("KeyC", "ValueC");
data2.Add("KeyD", "ValueD");
//Second DataType in the Dictionary could be something other than a Tuple
var result = new Dictionary<string, Tuple<string, string>>();
//Fill in for items existing only in data1 and in both data1 and data2
foreach(var item in data1)
{
result.Add(item.Key, new Tuple<string, string>(item.Value, data2.FirstOrDefault(x => x.Key == item.Key).Value));
}
//Fill in remaining items that exist only in data2
foreach(var item in data2.Where(d2 => !result.Any(x => x.Key == d2.Key )))
{
result.Add(item.Key, new Tuple<string, string>(null, item.Value));
}
//Demonstrating how to access the data
var formattedOutput = result.Select(x => string.Format("{0}, {1} (of D1), {2} (of D2)", x.Key, x.Value.Item1 ?? "NoValue", x.Value.Item2 ?? "NoValue"));
foreach(var line in formattedOutput)
{
Console.WriteLine(line);
}
Above is the screen shot of one of my Data Table. I am trying to transform this data into the following format so that I can bind it to one of my grid. I have tried LINQ but unsuccessful.
Could please anyone help me how I can do this. Doesn't necessarily be LINQ but I think it will be easier with LINQ
try below
var result = dataSet.Tables["reportColumns"].AsEnumerable().GroupBy(x => x.Field<string>("Object"))
.Select(g => new
{
ColumnName = g.Key,
DefaultColumn = g.FirstOrDefault(p => p.Field<string>("Attribute") == "DefaultColumn").Field<string>("Value"),
Label = g.FirstOrDefault(p => p.Field<string>("Attribute") == "Label").Field<string>("Value"),
Type = g.FirstOrDefault(p => p.Field<string>("Attribute") == "Type").Field<string>("Value"),
Standard = g.FirstOrDefault().Field<int>("Standard")
}).ToList();
You can use my ToPivotTable extension:
public static DataTable ToPivotTable<T, TColumn, TRow, TData>(
this IEnumerable<T> source,
Func<T, TColumn> columnSelector,
Expression<Func<T, TRow>> rowSelector,
Func<IEnumerable<T>, TData> dataSelector)
{
DataTable table = new DataTable();
var rowName = ((MemberExpression)rowSelector.Body).Member.Name;
table.Columns.Add(new DataColumn(rowName));
var columns = source.Select(columnSelector).Distinct();
foreach (var column in columns)
table.Columns.Add(new DataColumn(column.ToString()));
var rows = source.GroupBy(rowSelector.Compile())
.Select(rowGroup => new
{
Key = rowGroup.Key,
Values = columns.GroupJoin(
rowGroup,
c => c,
r => columnSelector(r),
(c, columnGroup) => dataSelector(columnGroup))
});
foreach (var row in rows)
{
var dataRow = table.NewRow();
var items = row.Values.Cast<object>().ToList();
items.Insert(0, row.Key);
dataRow.ItemArray = items.ToArray();
table.Rows.Add(dataRow);
}
return table;
}
Create strongly-typed data from your source table:
var data = from r in table.AsEnumerable()
select new {
Object = r.Field<string>("Object"),
Attribute = r.Field<string>("Attribute"),
Value = r.Field<object>("Value")
};
And convert them to pivot table:
var pivotTable = data.ToPivotTable(r => r.Attribute,
r => r.Object,
rows => rows.First().Value);
This will create pivot table with distinct values of Attribute (i.e. DefaultColumn, Label, Type) as columns, rows will be groups for each Object value, and each cell will have value of corresponding Value property for object group and attribute column.
Or in single query:
var pivotTable = table.AsEnumerable()
.Select(r => new {
Object = r.Field<string>("Object"),
Attribute = r.Field<string>("Attribute"),
Value = r.Field<object>("Value")
})
.ToPivotTable(r => r.Attribute,
r => r.Object,
rows => rows.First().Value);
Please look at what is wrong? I want to convert datarow to a string list.
public List<string> GetEmailList()
{
// get DataTable dt from somewhere.
List<DataRow> drlist = dt.AsEnumerable().ToList();
List<string> sEmail = new List<string>();
foreach (object str in drlist)
{
sEmail.Add(str.ToString()); // exception
}
return sEmail; // Ultimately to get a string list.
}
Thanks for help.
There's several problems here, but the biggest one is that you're trying to turn an entire row into a string, when really you should be trying to turn just a single cell into a string. You need to reference the first column of that DataRow, which you can do with brackets (like an array).
Try something like this instead:
public List<string> GetEmailList()
{
// get DataTable dt from somewhere.
List<string> sEmail = new List<string>();
foreach (DataRow row in dt.Rows)
{
sEmail.Add(row[0].ToString());
}
return sEmail; // Ultimately to get a string list.
}
Here's a one liner I got from ReSharper on how to do this, not sure of the performance implications, just thought I'd share it.
List<string> companyProxies = Enumerable.Select(
vendors.Tables[0].AsEnumerable(), vendor => vendor["CompanyName"].ToString()).ToList();
Here is how it would be with all additional syntax noise stripped: one-liner.
public List<string> GetEmailList()
{
return dt.AsEnumerable().Select(r => r[0].ToString()).ToList();
}
The Linq way ...
private static void Main(string[] args)
{
var dt = getdt();
var output = dt
.Rows
.Cast<DataRow>()
.ToList();
// or a CSV line
var csv = dt
.Rows
.Cast<DataRow>()
.Aggregate(new StringBuilder(), (sb, dr) => sb.AppendFormat(",{0}", dr[0]))
.Remove(0, 1);
Console.WriteLine(csv);
Console.ReadLine();
}
private static DataTable getdt()
{
var dc = new DataColumn("column1");
var dt = new DataTable("table1");
dt.Columns.Add(dc);
Enumerable.Range(0, 10)
.AsParallel()
.Select(i => string.Format("row {0}", i))
.ToList()
.ForEach(s =>
{
var dr = dt.NewRow();
dr[dc] = s;
dt.Rows.Add(dr);
});
return dt;
}
Ok so I've got a DataTable here's the schema
DataTable dt = new DataTable();
dt.Columns.Add("word", typeof(string));
dt.Columns.Add("pronunciation", typeof(string));
The table is filled already and I'm trying to make a linq query so that i can output to the console or anywhere something like :
Pronunciation : akses9~R => (list of words)
I want to output the pronunciations the most common and all the words that use it.
Something like this should give you what you want:
var results = dt.GroupBy(dr => dr.pronunciation);
foreach(var result in results)
{
Console.Write("Pronunciation : {0} =>", result.Key);
foreach(var word in result)
{
Console.Write("{0} ", word);
}
Console.WriteLine();
}
The GroupBy gives you an IGrouping whose Key property will contain the pronunciation and collection itself will contain all the words.
Sounds like you want a group by:
var q =
from row in dt.Rows.Cast<DataRow>()
let val = new { Word = (string)row["word"], Pronunciation = (string)row["pronunciation"] }
group val by val.Pronunciation into g
select g;
foreach (var group in q)
{
Console.WriteLine(
"Pronunciation : {0} => ({1})",
group.Key,
String.Join(", ", group.Select(x => x.Word).ToArray()));
}
var words = from row in table
where row.pronunciation == "akses9~R"
select row.word;
foreach (string word in words)
{
Console.WriteLine(word);
}