Convert datarow(only single column) to a string list - c#

Please look at what is wrong? I want to convert datarow to a string list.
public List<string> GetEmailList()
{
// get DataTable dt from somewhere.
List<DataRow> drlist = dt.AsEnumerable().ToList();
List<string> sEmail = new List<string>();
foreach (object str in drlist)
{
sEmail.Add(str.ToString()); // exception
}
return sEmail; // Ultimately to get a string list.
}
Thanks for help.

There's several problems here, but the biggest one is that you're trying to turn an entire row into a string, when really you should be trying to turn just a single cell into a string. You need to reference the first column of that DataRow, which you can do with brackets (like an array).
Try something like this instead:
public List<string> GetEmailList()
{
// get DataTable dt from somewhere.
List<string> sEmail = new List<string>();
foreach (DataRow row in dt.Rows)
{
sEmail.Add(row[0].ToString());
}
return sEmail; // Ultimately to get a string list.
}

Here's a one liner I got from ReSharper on how to do this, not sure of the performance implications, just thought I'd share it.
List<string> companyProxies = Enumerable.Select(
vendors.Tables[0].AsEnumerable(), vendor => vendor["CompanyName"].ToString()).ToList();

Here is how it would be with all additional syntax noise stripped: one-liner.
public List<string> GetEmailList()
{
return dt.AsEnumerable().Select(r => r[0].ToString()).ToList();
}

The Linq way ...
private static void Main(string[] args)
{
var dt = getdt();
var output = dt
.Rows
.Cast<DataRow>()
.ToList();
// or a CSV line
var csv = dt
.Rows
.Cast<DataRow>()
.Aggregate(new StringBuilder(), (sb, dr) => sb.AppendFormat(",{0}", dr[0]))
.Remove(0, 1);
Console.WriteLine(csv);
Console.ReadLine();
}
private static DataTable getdt()
{
var dc = new DataColumn("column1");
var dt = new DataTable("table1");
dt.Columns.Add(dc);
Enumerable.Range(0, 10)
.AsParallel()
.Select(i => string.Format("row {0}", i))
.ToList()
.ForEach(s =>
{
var dr = dt.NewRow();
dr[dc] = s;
dt.Rows.Add(dr);
});
return dt;
}

Related

Order list by date and sequence number

I have been learning C# (I am relatively new) and I have a list of input files consisting file naming format like "inputFile_dateSequence_sequenceNumber.xml". The code that I am using to sort the file lists in ascending order is below:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
string[] inputfiles = { "inputFile_2020-04-10_1.xml",
"inputFile_2020-04-10_2.xml",
"inputFile_2020-04-10_4.xml",
"inputFile_2020-04-10_3.xml",
"inputFile_2020-04-10_10.xml",
"inputFile_2020-05-10_1.xml",
"inputFile_2020-05-10_2.xml",
"inputFile_2020-05-10_10.xml",
"inputFile_2020-05-10_11.xml" };
List<string> stringList = new List<string>();
foreach (string s in inputfiles)
{
string bz = s.Split('.')[0];
stringList.Add(bz);
}
string[] Separator = new string[] { "_" };
var sortedList = stringList.OrderBy(i => i).ThenBy(s => int.Parse(s.Split(Separator, StringSplitOptions.None)[2])).ToList();
foreach (string i in sortedList)
{
Console.WriteLine(i);
}
}
}
But in ascending order, I am getting output as below:
inputFile_2020-04-10_1
inputFile_2020-04-10_10
inputFile_2020-04-10_2
inputFile_2020-04-10_3
inputFile_2020-04-10_4
inputFile_2020-05-10_1
inputFile_2020-05-10_10
inputFile_2020-05-10_11
inputFile_2020-05-10_2
but my desired output is like below:
inputFile_2020-04-10_1.xml
inputFile_2020-04-10_2.xml
inputFile_2020-04-10_3.xml
inputFile_2020-04-10_4.xml
inputFile_2020-04-10_10.xml
inputFile_2020-05-10_1.xml
inputFile_2020-05-10_2.xml
inputFile_2020-05-10_10.xml
inputFile_2020-05-10_11.xml
What modification should the code need in order to get the output like this?
You could achieve your need by using regex:
var sortedList= stringList.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(10, '0')));
Loads of ways to solve this, as you can see...
You can order first by just the date part of the name, then by the length of the name string, so smaller numbers like 1, 7 sort before longer numbers like 10, 17.. then by the name itself
.OrderBy(x => x.Remove(20))
.ThenBy(x=>x.Length)
.ThenBy(x=>x)
Perhaps though you'd parse the entire thing:
class MyFile{
string FullName {get;set;}
string Name {get;set;}
DateTime Date {get;set;}
int Num {get;set;}
MyFile(string fullname){
var bits = Path.GetFilenameWithoutExtension( fullname).Split('_');
FullName = FullName;
Name = bits[0];
Date = DateTime.Parse(bits[1]);
Num = int.Parse(bits[2]);
}
Then
var parsed = inputfiles.Select(x => new MyFile(x));
Now you can OrderBy that:
parsed.OrderBy(m => m.Date).ThenBy(m => m.Num);
Try to avoid doing everything at some base level of string/int primitive; this is OO programming! 😀
Use the following code:
var sortedList = stringList
.OrderBy(s => s.Substring(0, s.LastIndexOf('_'))) // sort by inputFile_dateSequence
.ThenBy(s => int.Parse(s.Substring(s.LastIndexOf('_') + 1))) // sort by sequenceNumber as integer
.ToList();
Update. If you want to preserve file extension, you can use the following:
List<string> sortedList = inputfiles
.Select(s =>
{
int nameSeparator = s.LastIndexOf('_');
int extSeparator = s.LastIndexOf('.');
return new
{
FullName = s,
BaseName = s.Substring(0, nameSeparator),
Sequence = int.Parse(s.Substring(nameSeparator + 1, extSeparator - nameSeparator - 1)),
Extension = s.Substring(extSeparator + 1)
};
})
.OrderBy(f => f.BaseName) // sort by inputFile_dateSequence
.ThenBy(f => f.Sequence) // sort by sequenceNumber
.ThenBy(f => f.Extension) // sort by file extension
.Select(f => f.FullName)
.ToList();
Using DataTable and LinQ may help you to do the task done easier.
Below is the code with DataTable that generate your desire output
public static void Main()
{
string[] inputfiles = { "inputFile_2020-04-10_1.xml",
"inputFile_2020-04-10_2.xml",
"inputFile_2020-04-10_4.xml",
"inputFile_2020-04-10_3.xml",
"inputFile_2020-04-10_10.xml",
"inputFile_2020-05-10_1.xml",
"inputFile_2020-05-10_2.xml",
"inputFile_2020-05-10_10.xml",
"inputFile_2020-05-10_11.xml" };
DataTable dt = new DataTable();
dt.Columns.Add("filename", typeof(string));
dt.Columns.Add("date", typeof(DateTime));
dt.Columns.Add("sequence", typeof(int));
foreach (string s in inputfiles)
{
DataRow dr = dt.NewRow();
dr[0] = s;
dr[1] = Convert.ToDateTime(s.Split('_')[1]);
dr[2] = Convert.ToInt32(s.Split('_')[2].Split('.')[0]);
dt.Rows.Add(dr);
}
DataTable sortedDT = dt.AsEnumerable()
.OrderBy(r => r.Field<DateTime>("date"))
.ThenBy(r => r.Field<int>("sequence"))
.CopyToDataTable();
foreach (DataRow dr in sortedDT.Rows)
{
Console.WriteLine(dr[0]);
}
}
Output:

Remove all columns from datatable except for 25

I have 500 Columns in my DataTable and I want to remove all of them except for 25 columns.
Is there any way to do this faster to save time and lines of code?
This is what I already tried:
private static void DeleteUselessColumns()
{
//This is example data!
List<DataColumn> dataColumnsToDelete = new List<DataColumn>();
DataTable bigData = new DataTable();
bigData.Columns.Add("Harry");
bigData.Columns.Add("Konstantin");
bigData.Columns.Add("George");
bigData.Columns.Add("Gabriel");
bigData.Columns.Add("Oscar");
bigData.Columns.Add("Muhammad");
bigData.Columns.Add("Emily");
bigData.Columns.Add("Olivia");
bigData.Columns.Add("Isla");
List<string> columnsToKeep = new List<string>();
columnsToKeep.Add("Isla");
columnsToKeep.Add("Oscar");
columnsToKeep.Add("Konstantin");
columnsToKeep.Add("Gabriel");
//This is the code i want to optimize------
foreach (DataColumn column in bigData.Columns)
{
bool keepColumn = false;
foreach (string s in columnsToKeep)
{
if (column.ColumnName.Equals(s))
{
keepColumn = true;
}
}
if (!keepColumn)
{
dataColumnsToDelete.Add(column);
}
}
foreach(DataColumn dataColumn in dataColumnsToDelete)
{
bigData.Columns.Remove(dataColumn);
}
//------------------------
}
var columnsToKeep = new List<string>() { "Isla", "Oscar", "Konstantin", "Gabriel"};
var toRemove = new List<DataColumn>();
foreach(DataColumn column in bigData.Columns)
{
if (!columnsToKeep.Any(name => column.ColumnName == name ))
{
toRemove.Add(column);
}
}
toRemove.ForEach(col => bigData.Columns.Remove(col));
Test1...test9 same code could be made a loop. No need to add the columns to delete in a list, just delete them in the first while loop. As for performance, not sure how to improve it.
You could try to use a DataView that selects the desired columns then copy to table. You need to experiment.
if they have different names create an array of string
var columns = new string[] { "Harry", "Konstantin","John"};
var columnsToKeep = new string[] { "John", "Konstantin"};
var columnsToDelete = from item in columns
where !columnsToKeep.Contains(item)
select item;
or using lambda
var columnsToDelete = columns
.Where (i=> !columnsToKeep.Contains(i))
.ToList();
toDelete
Harry

How can I improve the performance of the following code?

This code is working but taking too much time. Every data table contains 1000nds of rows and each time I need to filter data from another data tables with respect to a column.
for (int i = 0; i < dsResult.Tables[0].Rows.Count; i++)
{
DataTable dtFiltered = dtWorkExp.Clone();
foreach (DataRow drr in dtWorkExp.Rows)
{
if (drr["UserId"].ToString() == dsResult.Tables[0].Rows[i]["Registration NO."].ToString())
{
dtFiltered.ImportRow(drr);
}
}
DataTable dtFilteredAward= dtAwards.Clone();
foreach (DataRow drr in dtAwards.Rows)
{
if (drr["UserId"].ToString() == dsResult.Tables[0].Rows[i]["Registration NO."].ToString())
{
dtFilteredAward.ImportRow(drr);
}
}
DataTable dtFilteredOtherQual = dtOtherQual.Clone();
foreach (DataRow drr in dtOtherQual.Rows)
{
if (drr["UserId"].ToString() == dsResult.Tables[0].Rows[i]["Registration NO."].ToString())
{
dtFilteredOtherQual.ImportRow(drr);
}
}
//Do some operation with filtered Data Tables
}
You can declare these lines outside the for loop.
DataTable dtFiltered = dtWorkExp.Clone();
And instead of doing accessing dsResult.Table[0] each time, you can assign this to one variable and use it.
You can also replace the foreach loop with LINQ.
What I would do:
All rows of the main datatable as enumerable:
var rows = dsResult.Tables[0].AsEnumerable();
Get the column you're going to filter with:
var filter = rows.Select(r => r.Field<string>("Registration NO."));
Create a method that accepts that filter, a table to filter and a field to compare.
public static DataTable Filter<T>(EnumerableRowCollection<T> filter, DataTable table, string fieldName)
{
return table.AsEnumerable().Where(r => filter.Contains(r.Field<T>(fieldName))).CopyToDataTable();
}
Finally use the method to filter all tables:
var dtFiltered = Filter<string>(filter, dtWorkExp, "UserId");
var dtFilteredAward = Filter<string>(filter, dtAwards, "UserId");
var dtFilteredOtherQual = Filter<string>(filter, dtOtherQual, "UserId");
All together woul be something like this
public void YourMethod()
{
var rows = dsResult.Tables[0].AsEnumerable();
var filter = rows.Select(r => r.Field<string>("Registration NO."));
var dtFiltered = Filter<string>(filter, dtWorkExp, "UserId");
var dtFilteredAward = Filter<string>(filter, dtAwards, "UserId");
var dtFilteredOtherQual = Filter<string>(filter, dtOtherQual, "UserId");
}
public static DataTable Filter<T>(EnumerableRowCollection<T> filter, DataTable table, string fieldName)
{
return table.AsEnumerable().Where(r => filter.Contains(r.Field<T>(fieldName))).CopyToDataTable();
}
Put the value of the expression in a variable.
var regNo = dsResult.Tables[0].Rows[i]["Registration NO."].ToString();
Put the index of column to the variable. Access by index more faster then by column name.
int index = dtWorkExp.Columns["UserId"].Ordinal;
Result code:
int dtWorkIndex = dtWorkExp.Columns["UserId"].Ordinal;
int dtAwardsIndex = dtAwards.Columns["UserId"].Ordinal;
int dtOtherQualIdex = dtOtherQual.Columns["UserId"].Ordinal;
for (int i = 0; i < dsResult.Tables[0].Rows.Count; i++)
{
var regNo = dsResult.Tables[0].Rows[i]["Registration NO."].ToString();
DataTable dtFiltered = dtWorkExp.Clone();
foreach (DataRow drr in dtWorkExp.Rows)
{
if (drr[dtWorkIndex].ToString() == regNo)
{
dtFiltered.ImportRow(drr);
}
}
...
Of course, the column index can be set as a constant if you know it exactly in advance. Also, if the UserId indexes match in all tables, a single variable is sufficient.
You can also try using the BeginLoadData and EndLoadData methods.
DataTable dtFiltered = dtWorkExp.Clone();
dtFiltered.BeginLoadData();
foreach (DataRow drr in dtWorkExp.Rows)
{
if (drr[dtWorkIndex].ToString() == regNo)
{
dtFiltered.ImportRow(drr);
}
}
dtFiltered.EndLoadData();
But I'm not sure if they make sense together with ImportRow.
Finally, parallelization comes to help.
for (int i = 0; i < dsResult.Tables[0].Rows.Count; i++)
{
var regNo = ...;
var workTask = Task.Run(() =>
{
DataTable dtFiltered = dtWorkExp.Clone();
foreach (DataRow drr in dtWorkExp.Rows)
{
if (drr[dtWorkIndex].ToString() == regNo)
{
dtFiltered.ImportRow(drr);
}
}
return dtFiltered;
});
var awardTask = Task.Run(() =>
...
var otherQualTask = Task.Run(() =>
...
//Task.WaitAll(workTask, awardTask, otherQualTask);
await Task.WhenAll(workTask, awardTask, otherQualTask);
//Do some operation with filtered Data Tables
}

Convert DataTable into a Dictionary using Linq/Lambda

I have a DataTable that I would like to convert into dictionary in C# for my project. I can use the traditional way of programming to achieve the goal but it is not as elegant as using linq/lambda. I tried to use Lambda but I got stuck in how to flatten multiple rows into 1.
I have a mock DataTable for testing purpose.
static DataTable GetData()
{
DataTable table = new DataTable();
table.Columns.Add("Field1", typeof(string));
table.Columns.Add("Field2", typeof(string));
table.Rows.Add("A", "A1");
table.Rows.Add("A", "A2");
table.Rows.Add("B", "B1");
table.Rows.Add("A", "A3");
table.Rows.Add("C", "C1");
table.Rows.Add("D", "D1");
table.Rows.Add("A", "A5");
return table;
}
My traditional way to convert it to Dictionary is:
Dictionary<string, ArrayList> t = new Dictionary<string, ArrayList>();
foreach (DataRow r in GetData().Rows)
{
string k = (string)r["Field1"];
string v = (string)r["Field2"];
if (!t.Keys.Contains(r["Field1"]))
{
t.Add(k, new ArrayList());
}
if (t.Values == null)
{
t[k] = new ArrayList();
}
t[k].Add(v);
}
How do I achieve the same thing with Linq?
I have tried:
var res = GetData()
.AsEnumerable()
.GroupBy(row => row.Field<string>("Field1"))
.Select(grp => grp.First());
This only gives me the first occurrence of the item. I am stuck.
Please help.
Actually, you don't want to convert it to a Dictionary, but to a Lookup. Here's an example:
var lookup = GetData().AsEnumerable()
.ToLookup(r => r.Field<string>("Field1"), r => r.Field<string>("Field2"));
foreach (var grouping in lookup)
{
Console.WriteLine(grouping.Key + ": " + String.Join(", ", grouping));
}
Output:
A: A1, A2, A3, A5
B: B1
C: C1
D: D1
Get Data from Datatable as Dictionary without Linq/Lambda
DataTable dataTable = GetData();
var data = new List<Dictionary<string, object>>();
foreach (DataRow dataTableRow in dataTable.Rows)
{
var dic = new Dictionary<string, object>();
foreach (DataColumn tableColumn in dataTable.Columns)
{
dic.Add(tableColumn.ColumnName, dataTableRow[tableColumn]);
}
data.Add(dic);
}
you can get a Collection:
var res = GetData()
.AsEnumerable()
.Select(grp => new KeyValuePair<string, string>(grp[0].ToString(), grp[1].ToString()));

Use linq to find DataTable(Name) in a DataSet using unique list of Column Names

I got roped into some old code, that uses loose (untyped) datasets all over the place.
I'm trying to write a helper method to find the DataTable.Name using the names of some columns.....(because the original code has checks for "sometimes we have 2 datatables in a dataset, sometimes 3, sometimes 4)..and its hard to know the order. Basically, the TSQL Select statements conditionally run. (Gaaaaaaaaaaaaaahhh).
Anyway. I wrote the below, and if I give it 2 column names, its matching on "any" columnname, not "all column names".
Its probably my linq skillz (again), and probably a simple fix.
But I've tried to get the syntax sugar down..below is one of the things I wrote, that compiles.
private static void DataTableFindStuff()
{
DataSet ds = new DataSet();
DataTable dt1 = new DataTable("TableOne");
dt1.Columns.Add("Table1Column11");
dt1.Columns.Add("Name");
dt1.Columns.Add("Age");
dt1.Columns.Add("Height");
DataRow row1a = dt1.NewRow();
row1a["Table1Column11"] = "Table1Column11_ValueA";
row1a["Name"] = "Table1_Name_NameA";
row1a["Age"] = "AgeA";
row1a["Height"] = "HeightA";
dt1.Rows.Add(row1a);
DataRow row1b = dt1.NewRow();
row1b["Table1Column11"] = "Table1Column11_ValueB";
row1b["Name"] = "Table1_Name_NameB";
row1b["Age"] = "AgeB";
row1b["Height"] = "HeightB";
dt1.Rows.Add(row1b);
ds.Tables.Add(dt1);
DataTable dt2 = new DataTable("TableTwo");
dt2.Columns.Add("Table2Column21");
dt2.Columns.Add("Name");
dt2.Columns.Add("BirthCity");
dt2.Columns.Add("BirthState");
DataRow row2a = dt2.NewRow();
row2a["Table2Column21"] = "Table2Column1_ValueG";
row2a["Name"] = "Table2_Name_NameG";
row2a["BirthCity"] = "BirthCityA";
row2a["BirthState"] = "BirthStateA";
dt2.Rows.Add(row2a);
DataRow row2b = dt2.NewRow();
row2b["Table2Column21"] = "Table2Column1_ValueH";
row2b["Name"] = "Table2_Name_NameH";
row2b["BirthCity"] = "BirthCityB";
row2b["BirthState"] = "BirthStateB";
dt2.Rows.Add(row2b);
ds.Tables.Add(dt2);
DataTable dt3 = new DataTable("TableThree");
dt3.Columns.Add("Table3Column31");
dt3.Columns.Add("Name");
dt3.Columns.Add("Price");
dt3.Columns.Add("QuantityOnHand");
DataRow row3a = dt3.NewRow();
row3a["Table3Column31"] = "Table3Column31_ValueM";
row3a["Name"] = "Table3_Name_Name00M";
row3a["Price"] = "PriceA";
row3a["QuantityOnHand"] = "QuantityOnHandA";
dt3.Rows.Add(row3a);
DataRow row3b = dt3.NewRow();
row3b["Table3Column31"] = "Table3Column31_ValueN";
row3b["Name"] = "Table3_Name_Name00N";
row3b["Price"] = "PriceB";
row3b["QuantityOnHand"] = "QuantityOnHandB";
dt3.Rows.Add(row3b);
ds.Tables.Add(dt3);
string foundDataTable1Name = FindDataTableName(ds, new List<string> { "Table1Column11", "Name" });
/* foundDataTable1Name should be 'TableOne' */
string foundDataTable2Name = FindDataTableName(ds, new List<string> { "Table2Column21", "Name" });
/* foundDataTable1Name should be 'TableTwo' */
string foundDataTable3Name = FindDataTableName(ds, new List<string> { "Table3Column31", "Name" });
/* foundDataTable1Name should be 'TableThree' */
string foundDataTableThrowsExceptionName = FindDataTableName(ds, new List<string> { "Name" });
/* show throw exception as 'Name' is in multiple (distinct) tables */
}
public static string FindDataTableName(DataSet ds, List<string> columnNames)
{
string returnValue = string.Empty;
DataTable foundDataTable = FindDataTable(ds, columnNames);
if (null != foundDataTable)
{
returnValue = foundDataTable.TableName;
}
return returnValue;
}
public static DataTable FindDataTable(DataSet ds, List<string> columnNames)
{
DataTable returnItem = null;
if (null == ds || null == columnNames)
{
return null;
}
List<DataTable> tables =
ds.Tables
.Cast<DataTable>()
.SelectMany
(t => t.Columns.Cast<DataColumn>()
.Where(c => columnNames.Contains(c.ColumnName))
)
.Select(c => c.Table).Distinct().ToList();
if (null != tables)
{
if (tables.Count <= 1)
{
returnItem = tables.FirstOrDefault();
}
else
{
throw new IndexOutOfRangeException(string.Format("FindDataTable found more than one matching Table based on the input column names. ({0})", String.Join(", ", columnNames.ToArray())));
}
}
return returnItem;
}
I tried this too (to no avail) (always has 0 matches)
List<DataTable> tables =
ds.Tables
.Cast<DataTable>()
.Where
(t => t.Columns.Cast<DataColumn>()
.All(c => columnNames.Contains(c.ColumnName))
)
.Distinct().ToList();
To me sounds like you're trying to see if columnNames passed to the method are contained within Column's name collection of Table. If that's the case, this should do the work.
List<DataTable> tables =
ds.Tables
.Cast<DataTable>()
.Where(dt => !columnNames.Except(dt.Columns.Select(c => c.Name)).Any())
.ToList();
(Below is an append by the asker of the question)
Well, I had to tweak it to make it compile, but you got me there..
Thanks.
Final Answer:
List<DataTable> tables =
ds.Tables.Cast<DataTable>()
.Where
(dt => !columnNames.Except(dt.Columns.Cast<DataColumn>()
.Select(c => c.ColumnName))
.Any()
)
.ToList();
Final Answer (which is not case sensitive):
List<DataTable> tables =
ds.Tables.Cast<DataTable>()
.Where
(dt => !columnNames.Except(dt.Columns.Cast<DataColumn>()
.Select(c => c.ColumnName), StringComparer.OrdinalIgnoreCase)
.Any()
)
.ToList();

Categories

Resources