Order list by date and sequence number - c#

I have been learning C# (I am relatively new) and I have a list of input files consisting file naming format like "inputFile_dateSequence_sequenceNumber.xml". The code that I am using to sort the file lists in ascending order is below:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
string[] inputfiles = { "inputFile_2020-04-10_1.xml",
"inputFile_2020-04-10_2.xml",
"inputFile_2020-04-10_4.xml",
"inputFile_2020-04-10_3.xml",
"inputFile_2020-04-10_10.xml",
"inputFile_2020-05-10_1.xml",
"inputFile_2020-05-10_2.xml",
"inputFile_2020-05-10_10.xml",
"inputFile_2020-05-10_11.xml" };
List<string> stringList = new List<string>();
foreach (string s in inputfiles)
{
string bz = s.Split('.')[0];
stringList.Add(bz);
}
string[] Separator = new string[] { "_" };
var sortedList = stringList.OrderBy(i => i).ThenBy(s => int.Parse(s.Split(Separator, StringSplitOptions.None)[2])).ToList();
foreach (string i in sortedList)
{
Console.WriteLine(i);
}
}
}
But in ascending order, I am getting output as below:
inputFile_2020-04-10_1
inputFile_2020-04-10_10
inputFile_2020-04-10_2
inputFile_2020-04-10_3
inputFile_2020-04-10_4
inputFile_2020-05-10_1
inputFile_2020-05-10_10
inputFile_2020-05-10_11
inputFile_2020-05-10_2
but my desired output is like below:
inputFile_2020-04-10_1.xml
inputFile_2020-04-10_2.xml
inputFile_2020-04-10_3.xml
inputFile_2020-04-10_4.xml
inputFile_2020-04-10_10.xml
inputFile_2020-05-10_1.xml
inputFile_2020-05-10_2.xml
inputFile_2020-05-10_10.xml
inputFile_2020-05-10_11.xml
What modification should the code need in order to get the output like this?

You could achieve your need by using regex:
var sortedList= stringList.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(10, '0')));

Loads of ways to solve this, as you can see...
You can order first by just the date part of the name, then by the length of the name string, so smaller numbers like 1, 7 sort before longer numbers like 10, 17.. then by the name itself
.OrderBy(x => x.Remove(20))
.ThenBy(x=>x.Length)
.ThenBy(x=>x)
Perhaps though you'd parse the entire thing:
class MyFile{
string FullName {get;set;}
string Name {get;set;}
DateTime Date {get;set;}
int Num {get;set;}
MyFile(string fullname){
var bits = Path.GetFilenameWithoutExtension( fullname).Split('_');
FullName = FullName;
Name = bits[0];
Date = DateTime.Parse(bits[1]);
Num = int.Parse(bits[2]);
}
Then
var parsed = inputfiles.Select(x => new MyFile(x));
Now you can OrderBy that:
parsed.OrderBy(m => m.Date).ThenBy(m => m.Num);
Try to avoid doing everything at some base level of string/int primitive; this is OO programming! 😀

Use the following code:
var sortedList = stringList
.OrderBy(s => s.Substring(0, s.LastIndexOf('_'))) // sort by inputFile_dateSequence
.ThenBy(s => int.Parse(s.Substring(s.LastIndexOf('_') + 1))) // sort by sequenceNumber as integer
.ToList();
Update. If you want to preserve file extension, you can use the following:
List<string> sortedList = inputfiles
.Select(s =>
{
int nameSeparator = s.LastIndexOf('_');
int extSeparator = s.LastIndexOf('.');
return new
{
FullName = s,
BaseName = s.Substring(0, nameSeparator),
Sequence = int.Parse(s.Substring(nameSeparator + 1, extSeparator - nameSeparator - 1)),
Extension = s.Substring(extSeparator + 1)
};
})
.OrderBy(f => f.BaseName) // sort by inputFile_dateSequence
.ThenBy(f => f.Sequence) // sort by sequenceNumber
.ThenBy(f => f.Extension) // sort by file extension
.Select(f => f.FullName)
.ToList();

Using DataTable and LinQ may help you to do the task done easier.
Below is the code with DataTable that generate your desire output
public static void Main()
{
string[] inputfiles = { "inputFile_2020-04-10_1.xml",
"inputFile_2020-04-10_2.xml",
"inputFile_2020-04-10_4.xml",
"inputFile_2020-04-10_3.xml",
"inputFile_2020-04-10_10.xml",
"inputFile_2020-05-10_1.xml",
"inputFile_2020-05-10_2.xml",
"inputFile_2020-05-10_10.xml",
"inputFile_2020-05-10_11.xml" };
DataTable dt = new DataTable();
dt.Columns.Add("filename", typeof(string));
dt.Columns.Add("date", typeof(DateTime));
dt.Columns.Add("sequence", typeof(int));
foreach (string s in inputfiles)
{
DataRow dr = dt.NewRow();
dr[0] = s;
dr[1] = Convert.ToDateTime(s.Split('_')[1]);
dr[2] = Convert.ToInt32(s.Split('_')[2].Split('.')[0]);
dt.Rows.Add(dr);
}
DataTable sortedDT = dt.AsEnumerable()
.OrderBy(r => r.Field<DateTime>("date"))
.ThenBy(r => r.Field<int>("sequence"))
.CopyToDataTable();
foreach (DataRow dr in sortedDT.Rows)
{
Console.WriteLine(dr[0]);
}
}
Output:

Related

How do I find and list duplicate rows based on columns in a CSV file using C#. Matching/Grouping Rows.

I converted an excel file into a CSV file. The file contains over 100k records. I'm wanting to search and return duplicate rows by searching the full name column. If the full name's match up I want the program to return the entire rows of the duplicates. I started with a code that returns a list of full names but that's about it.
I've listed the code that I have now below:
public static void readCells()
{
var dictionary = new Dictionary<string, int>();
Console.WriteLine("started");
var counter = 1;
var readText = File.ReadAllLines(path);
var duplicatedValues = dictionary.GroupBy(fullName => fullName.Value).Where(fullName => fullName.Count() > 1);
foreach (var s in readText)
{
var values = s.Split(new Char[] { ',' });
var fullName = values[3];
if (!dictionary.ContainsKey(fullName))
{
dictionary.Add(fullName, 1);
}
else
{
dictionary[fullName] += 1;
}
Console.WriteLine("Full Name Is: " + values[3]);
counter++;
}
}
}
I changed dictionary to use fullname as key :
public static void readCells()
{
var dictionary = new Dictionary<string, List<List<string>>>();
Console.WriteLine("started");
var counter = 1;
var readText = File.ReadAllLines(path);
var duplicatedValues = dictionary.GroupBy(fullName => fullName.Value).Where(fullName => fullName.Count() > 1);
foreach (var s in readText)
{
List<string> values = s.Split(new Char[] { ',' }).ToList();
string fullName = values[3];
if (!dictionary.ContainsKey(fullName))
{
List<List<string>> newList = new List<List<string>>();
newList.Add(values);
dictionary.Add(fullName, newList);
}
else
{
dictionary[fullName].Add(values);
}
Console.WriteLine("Full Name Is: " + values[3]);
counter++;
}
}
I've found that using Microsoft's built-in TextFieldParser (which you can use in c# despite being in the Microsoft.VisualBasic.FileIO namespace) can simplify reading and parsing of CSV files.
Using this type, your method ReadCells() can be modified into the following extension method:
using Microsoft.VisualBasic.FileIO;
public static class TextFieldParserExtensions
{
public static List<IGrouping<string, string[]>> ReadCellsWithDuplicatedCellValues(string path, int keyCellIndex, int nRowsToSkip /* = 0 */)
{
using (var stream = File.OpenRead(path))
using (var parser = new TextFieldParser(stream))
{
parser.SetDelimiters(new string[] { "," });
var values = parser.ReadAllFields()
// If your CSV file contains header row(s) you can skip them by passing a value for nRowsToSkip
.Skip(nRowsToSkip)
.GroupBy(row => row.ElementAtOrDefault(keyCellIndex))
.Where(g => g.Count() > 1)
.ToList();
return values;
}
}
public static IEnumerable<string[]> ReadAllFields(this TextFieldParser parser)
{
if (parser == null)
throw new ArgumentNullException();
while (!parser.EndOfData)
yield return parser.ReadFields();
}
}
Which you would call like:
var groups = TextFieldParserExtensions.ReadCellsWithDuplicatedCellValues(path, 3);
Notes:
TextFieldParser correctly handles cells with escaped, embedded commas which s.Split(new Char[] { ',' }) will not.
Since your CSV file has over 100k records I adopted a streaming strategy to avoid the intermediate string[] readText memory allocation.
You can try out Cinchoo ETL - an open source library to parse CSV file and identify the duplicates with few lines of code.
Sample CSV file (EmpDuplicates.csv) below
Id,Name
1,Tom
2,Mark
3,Lou
3,Lou
4,Austin
4,Austin
4,Austin
Here is how you can parse and identify the duplicate records
using (var parser = new ChoCSVReader("EmpDuplicates.csv").WithFirstLineHeader())
{
foreach (dynamic c in parser.GroupBy(r => r.Id).Where(g => g.Count() > 1).Select(g => g.FirstOrDefault()))
Console.WriteLine(c.DumpAsJson());
}
Output:
{
"Id": 3,
"Name": "Lou"
}
{
"Id": 4,
"Name": "Austin"
}
Hope this helps.
For more detailed usage of this library, visit CodeProject article at https://www.codeproject.com/Articles/1145337/Cinchoo-ETL-CSV-Reader

Removing Items From List with Parameter of Matching String most Recent DateTime

I am trying to narrow down a list which is comprised of all files with matching product ID (eg M320.1215). When I say I need to narrow it down I want to remove the list entries in order to keep only the most recent items in the list.
This is an example of a file name: I_ATTRIBUTES_M320.1215_EGHS_CS_07112016225939.xlsx
Here you see the Product Id as "M320.1215"
The Subformat and Language "EGHS_CS"
And a date and time 07112016225939 in format MMDDYYYYHHMMSS. I can get the date time into DateTime object using:
public DateTime correctedDateString(string dts)
{
string correctDTS = dts.Insert(2, "/");
correctDTS = correctDTS.Insert(5, "/");
correctDTS = correctDTS.Insert(10, " ");
correctDTS = correctDTS.Insert(13, ":");
correctDTS = correctDTS.Insert(16, ":");
DateTime convertedDate = DateTime.Now;
try
{
convertedDate = Convert.ToDateTime(correctDTS);
Console.WriteLine("'{0}' converts to {1} {2} time.", correctDTS, convertedDate, convertedDate.Kind.ToString());
}
catch (FormatException)
{
convertedDate = Convert.ToDateTime("01/01/2015 00:00:00");
Console.WriteLine("'{0}' is not in the proper format.", correctDTS);
}
return convertedDate;
This obviously a simple method.
I have been using the following to split the items in the list into usable segments:
string[] tempArray = Path.GetFileNameWithoutExtension(filenames[i].ToString()).ToString().Split(new[] { "_" }, StringSplitOptions.None);
Now what I am struggling with is to manipulate the following list to only keep the most recent versions of each subFormat and language combo.
List<string> filenames = new List<string>()
{
"I_ATTRIBUTES_M320.1215_EGHS_RU_07132016020215",
"I_ATTRIBUTES_M320.1215_EGHS_BE_06292016132122",
"I_ATTRIBUTES_M320.1215_EGHS_BE_06302016100039",
"I_ATTRIBUTES_M320.1215_EGHS_BE_07042016080530",
"I_ATTRIBUTES_M320.1215_EGHS_BE_07112016225936",
"I_ATTRIBUTES_M320.1215_EGHS_BE_07132016020203",
"I_ATTRIBUTES_M320.1215_EGHS_BR_06292016132127",
"I_ATTRIBUTES_M320.1215_EGHS_BR_06302016100042",
"I_ATTRIBUTES_M320.1215_EGHS_BR_07042016080536",
"I_ATTRIBUTES_M320.1215_EGHS_BR_07112016225938",
"I_ATTRIBUTES_M320.1215_EGHS_BR_07132016020206",
"I_ATTRIBUTES_M320.1215_EGHS_CS_07112016225939",
"I_ATTRIBUTES_M320.1215_EGHS_CS_07132016020207",
"I_ATTRIBUTES_M320.1215_EGHS_DE_06292016132128",
"I_ATTRIBUTES_M320.1215_EGHS_DE_06302016100044",
"I_ATTRIBUTES_M320.1215_EGHS_DE_07042016080537",
"I_ATTRIBUTES_M320.1215_EGHS_DE_07112016225940",
"I_ATTRIBUTES_M320.1215_EGHS_DE_07132016020208",
"I_ATTRIBUTES_M320.1215_EGHS_FR_06292016132129",
"I_ATTRIBUTES_M320.1215_EGHS_FR_06302016100045",
"I_ATTRIBUTES_M320.1215_EGHS_FR_07042016080538",
"I_ATTRIBUTES_M320.1215_EGHS_FR_07112016225941",
"I_ATTRIBUTES_M320.1215_EGHS_FR_07132016020210",
"I_ATTRIBUTES_M320.1215_EGHS_IT_06292016132129",
"I_ATTRIBUTES_M320.1215_EGHS_IT_06302016100046",
"I_ATTRIBUTES_M320.1215_EGHS_IT_07042016080539",
"I_ATTRIBUTES_M320.1215_EGHS_IT_07112016225941",
"I_ATTRIBUTES_M320.1215_EGHS_IT_07132016020211",
"I_ATTRIBUTES_M320.1215_EGHS_MS_06292016132130",
"I_ATTRIBUTES_M320.1215_EGHS_MS_06302016100047",
"I_ATTRIBUTES_M320.1215_EGHS_MS_07042016080540",
"I_ATTRIBUTES_M320.1215_EGHS_MS_07112016225943",
"I_ATTRIBUTES_M320.1215_EGHS_MS_07132016020212",
"I_ATTRIBUTES_M320.1215_EGHS_PL_06292016132131",
"I_ATTRIBUTES_M320.1215_EGHS_PL_06302016100048",
"I_ATTRIBUTES_M320.1215_EGHS_PL_07042016080541",
"I_ATTRIBUTES_M320.1215_EGHS_PL_07112016225944",
"I_ATTRIBUTES_M320.1215_EGHS_PL_07132016020214",
"I_ATTRIBUTES_M320.1215_EGHS_RU_06292016132131",
"I_ATTRIBUTES_M320.1215_EGHS_RU_06302016100049",
"I_ATTRIBUTES_M320.1215_EGHS_RU_07042016080542",
"I_ATTRIBUTES_M320.1215_EGHS_RU_07112016225945"
};
So essentially I need the final list to be as follows:
List<string> filenames = new List<string>()
{
"I_ATTRIBUTES_M320.1215_EGHS_BE_07132016020203",
"I_ATTRIBUTES_M320.1215_EGHS_BR_07132016020206",
"I_ATTRIBUTES_M320.1215_EGHS_CS_07132016020207",
"I_ATTRIBUTES_M320.1215_EGHS_DE_07132016020208",
"I_ATTRIBUTES_M320.1215_EGHS_FR_07132016020210",
"I_ATTRIBUTES_M320.1215_EGHS_IT_07132016020211",
"I_ATTRIBUTES_M320.1215_EGHS_MS_07132016020212",
"I_ATTRIBUTES_M320.1215_EGHS_PL_07132016020214",
"I_ATTRIBUTES_M320.1215_EGHS_RU_07132016020215"
};
Thank you in advance for any help.
You could do this with Linq
var grouped = filenames.Select(x => x.Split('_'))
.GroupBy(x => x[2] + x[3] + x[4], p => p, (key, g) => new { Id = key, Items = g.ToList() })
.Select(x => x.Items.OrderByDescending(i => correctedDateString(i[5])).FirstOrDefault())
.Select(x => string.Join("_", x))
.ToList();
Try this:
var result = filenames.Select(s =>
{
var splitted = s.Split('_');
return new
{
ProductId = splitted[2],
Subformat = splitted[3],
Language = splitted[4],
DateTime = DateTime.ParseExact(splitted[5], "MMddyyyyHHmmss", null),
Source = s
};
})
.GroupBy(a => new { a.ProductId, a.Subformat, a.Language })
.Select(g => g.First(a => a.DateTime == g.Max(b => b.DateTime)).Source)
.ToList();
I'm using DataTime.ParseExact method instead of your correctedDateString method.
You can use List<T>.Find(Predicate<T>) method to find the particular item and use List<T>.Remove(T) method to remove the selected item.
Example:
// Find an Employee by their ID.
Employee result = Employees.Find(
delegate(Employee emp)
{
return emp.ID == IDtoFind;
}
);

Optimization of nested loops using LINQ

Can you please suggest how to write an optmized LINQ query for the following operation?
foreach (DataRow entry1 in table1.Rows)
{
var columnA = entry1["ColumnA"] as string;
if (!string.IsNullOrEmpty(columnA))
{
foreach (string entry2 in table2)
{
var dataExists = table3.Any(rows3 =>
!string.IsNullOrEmpty(rows3[entry2] as string)
&& columnA.IsEqual(rows3["ColumnB"] as string));
if (dataExists)
{
entry1[entry2] = Compute(columnA, entry2);
}
}
}
}
I tried with this, but the results don't match in terms of the unique iteration counts.
var t2t3Pair = from entry2 in table2
let entry3 = table3.FirstOrDefault(x =>
!string.IsNullOrEmpty(x[entry2] as string))
where entry3 != null
select new { entry2, entry3 };
var t1t3Pair = from pair in t2t3Pair
from entry1 in table1.AsEnumerable()
let columnA = entry1["ColumnA"] as string
where !string.IsNullOrEmpty(columnA)
&& columnA.IsEqual(pair.entry3["ColumnB"] as string)
select new { Entry1Alias = entry1, Entry2Alias = pair.entry2 };
foreach (var pair in t1t3Pair)
{
var columnA = (string)pair.Entry1Alias["ColumnA"];
pair.Entry1Alias[pair.Entry2Alias] = Compute(columnA, pair.Entry2Alias);
}
Note: IsEqual is my extension method to compare string without case sensitivity.
Apparently the bottleneck is the line
var dataExists = table3.Any(rows3 =>
!string.IsNullOrEmpty(rows3[entry2] as string)
&& columnA.IsEqual(rows3["ColumnB"] as string));
which is executed inside the innermost loop.
As usual, it can be optimized by preparing in advance a fast lookup data structure and use it inside the critical loop.
For your case, I would suggest something like this:
var dataExistsMap = table3.AsEnumerable()
.GroupBy(r => r["ColumnB"] as string)
.Where(g => !string.IsNullOrEmpty(g.Key))
.ToDictionary(g => g.Key, g => new HashSet<string>(
table2.Where(e => g.Any(r => !string.IsNullOrEmpty(r[e] as string)))
// Include the proper comparer if your IsEqual method is using non default string comparison
//, StringComparer.OrdinalIgnoreCase
)
);
foreach (DataRow entry1 in table1.Rows)
{
var columnA = entry1["ColumnA"] as string;
if (string.IsNullOrEmpty(columnA)) continue;
HashSet<string> dataExistsSet;
if (!dataExistsMap.TryGetValue(columnA, out dataExistsSet)) continue;
foreach (string entry2 in table2.Where(dataExistsSet.Contains))
entry1[entry2] = Compute(columnA, entry2);
}

Use linq to find DataTable(Name) in a DataSet using unique list of Column Names

I got roped into some old code, that uses loose (untyped) datasets all over the place.
I'm trying to write a helper method to find the DataTable.Name using the names of some columns.....(because the original code has checks for "sometimes we have 2 datatables in a dataset, sometimes 3, sometimes 4)..and its hard to know the order. Basically, the TSQL Select statements conditionally run. (Gaaaaaaaaaaaaaahhh).
Anyway. I wrote the below, and if I give it 2 column names, its matching on "any" columnname, not "all column names".
Its probably my linq skillz (again), and probably a simple fix.
But I've tried to get the syntax sugar down..below is one of the things I wrote, that compiles.
private static void DataTableFindStuff()
{
DataSet ds = new DataSet();
DataTable dt1 = new DataTable("TableOne");
dt1.Columns.Add("Table1Column11");
dt1.Columns.Add("Name");
dt1.Columns.Add("Age");
dt1.Columns.Add("Height");
DataRow row1a = dt1.NewRow();
row1a["Table1Column11"] = "Table1Column11_ValueA";
row1a["Name"] = "Table1_Name_NameA";
row1a["Age"] = "AgeA";
row1a["Height"] = "HeightA";
dt1.Rows.Add(row1a);
DataRow row1b = dt1.NewRow();
row1b["Table1Column11"] = "Table1Column11_ValueB";
row1b["Name"] = "Table1_Name_NameB";
row1b["Age"] = "AgeB";
row1b["Height"] = "HeightB";
dt1.Rows.Add(row1b);
ds.Tables.Add(dt1);
DataTable dt2 = new DataTable("TableTwo");
dt2.Columns.Add("Table2Column21");
dt2.Columns.Add("Name");
dt2.Columns.Add("BirthCity");
dt2.Columns.Add("BirthState");
DataRow row2a = dt2.NewRow();
row2a["Table2Column21"] = "Table2Column1_ValueG";
row2a["Name"] = "Table2_Name_NameG";
row2a["BirthCity"] = "BirthCityA";
row2a["BirthState"] = "BirthStateA";
dt2.Rows.Add(row2a);
DataRow row2b = dt2.NewRow();
row2b["Table2Column21"] = "Table2Column1_ValueH";
row2b["Name"] = "Table2_Name_NameH";
row2b["BirthCity"] = "BirthCityB";
row2b["BirthState"] = "BirthStateB";
dt2.Rows.Add(row2b);
ds.Tables.Add(dt2);
DataTable dt3 = new DataTable("TableThree");
dt3.Columns.Add("Table3Column31");
dt3.Columns.Add("Name");
dt3.Columns.Add("Price");
dt3.Columns.Add("QuantityOnHand");
DataRow row3a = dt3.NewRow();
row3a["Table3Column31"] = "Table3Column31_ValueM";
row3a["Name"] = "Table3_Name_Name00M";
row3a["Price"] = "PriceA";
row3a["QuantityOnHand"] = "QuantityOnHandA";
dt3.Rows.Add(row3a);
DataRow row3b = dt3.NewRow();
row3b["Table3Column31"] = "Table3Column31_ValueN";
row3b["Name"] = "Table3_Name_Name00N";
row3b["Price"] = "PriceB";
row3b["QuantityOnHand"] = "QuantityOnHandB";
dt3.Rows.Add(row3b);
ds.Tables.Add(dt3);
string foundDataTable1Name = FindDataTableName(ds, new List<string> { "Table1Column11", "Name" });
/* foundDataTable1Name should be 'TableOne' */
string foundDataTable2Name = FindDataTableName(ds, new List<string> { "Table2Column21", "Name" });
/* foundDataTable1Name should be 'TableTwo' */
string foundDataTable3Name = FindDataTableName(ds, new List<string> { "Table3Column31", "Name" });
/* foundDataTable1Name should be 'TableThree' */
string foundDataTableThrowsExceptionName = FindDataTableName(ds, new List<string> { "Name" });
/* show throw exception as 'Name' is in multiple (distinct) tables */
}
public static string FindDataTableName(DataSet ds, List<string> columnNames)
{
string returnValue = string.Empty;
DataTable foundDataTable = FindDataTable(ds, columnNames);
if (null != foundDataTable)
{
returnValue = foundDataTable.TableName;
}
return returnValue;
}
public static DataTable FindDataTable(DataSet ds, List<string> columnNames)
{
DataTable returnItem = null;
if (null == ds || null == columnNames)
{
return null;
}
List<DataTable> tables =
ds.Tables
.Cast<DataTable>()
.SelectMany
(t => t.Columns.Cast<DataColumn>()
.Where(c => columnNames.Contains(c.ColumnName))
)
.Select(c => c.Table).Distinct().ToList();
if (null != tables)
{
if (tables.Count <= 1)
{
returnItem = tables.FirstOrDefault();
}
else
{
throw new IndexOutOfRangeException(string.Format("FindDataTable found more than one matching Table based on the input column names. ({0})", String.Join(", ", columnNames.ToArray())));
}
}
return returnItem;
}
I tried this too (to no avail) (always has 0 matches)
List<DataTable> tables =
ds.Tables
.Cast<DataTable>()
.Where
(t => t.Columns.Cast<DataColumn>()
.All(c => columnNames.Contains(c.ColumnName))
)
.Distinct().ToList();
To me sounds like you're trying to see if columnNames passed to the method are contained within Column's name collection of Table. If that's the case, this should do the work.
List<DataTable> tables =
ds.Tables
.Cast<DataTable>()
.Where(dt => !columnNames.Except(dt.Columns.Select(c => c.Name)).Any())
.ToList();
(Below is an append by the asker of the question)
Well, I had to tweak it to make it compile, but you got me there..
Thanks.
Final Answer:
List<DataTable> tables =
ds.Tables.Cast<DataTable>()
.Where
(dt => !columnNames.Except(dt.Columns.Cast<DataColumn>()
.Select(c => c.ColumnName))
.Any()
)
.ToList();
Final Answer (which is not case sensitive):
List<DataTable> tables =
ds.Tables.Cast<DataTable>()
.Where
(dt => !columnNames.Except(dt.Columns.Cast<DataColumn>()
.Select(c => c.ColumnName), StringComparer.OrdinalIgnoreCase)
.Any()
)
.ToList();

Convert datarow(only single column) to a string list

Please look at what is wrong? I want to convert datarow to a string list.
public List<string> GetEmailList()
{
// get DataTable dt from somewhere.
List<DataRow> drlist = dt.AsEnumerable().ToList();
List<string> sEmail = new List<string>();
foreach (object str in drlist)
{
sEmail.Add(str.ToString()); // exception
}
return sEmail; // Ultimately to get a string list.
}
Thanks for help.
There's several problems here, but the biggest one is that you're trying to turn an entire row into a string, when really you should be trying to turn just a single cell into a string. You need to reference the first column of that DataRow, which you can do with brackets (like an array).
Try something like this instead:
public List<string> GetEmailList()
{
// get DataTable dt from somewhere.
List<string> sEmail = new List<string>();
foreach (DataRow row in dt.Rows)
{
sEmail.Add(row[0].ToString());
}
return sEmail; // Ultimately to get a string list.
}
Here's a one liner I got from ReSharper on how to do this, not sure of the performance implications, just thought I'd share it.
List<string> companyProxies = Enumerable.Select(
vendors.Tables[0].AsEnumerable(), vendor => vendor["CompanyName"].ToString()).ToList();
Here is how it would be with all additional syntax noise stripped: one-liner.
public List<string> GetEmailList()
{
return dt.AsEnumerable().Select(r => r[0].ToString()).ToList();
}
The Linq way ...
private static void Main(string[] args)
{
var dt = getdt();
var output = dt
.Rows
.Cast<DataRow>()
.ToList();
// or a CSV line
var csv = dt
.Rows
.Cast<DataRow>()
.Aggregate(new StringBuilder(), (sb, dr) => sb.AppendFormat(",{0}", dr[0]))
.Remove(0, 1);
Console.WriteLine(csv);
Console.ReadLine();
}
private static DataTable getdt()
{
var dc = new DataColumn("column1");
var dt = new DataTable("table1");
dt.Columns.Add(dc);
Enumerable.Range(0, 10)
.AsParallel()
.Select(i => string.Format("row {0}", i))
.ToList()
.ForEach(s =>
{
var dr = dt.NewRow();
dr[dc] = s;
dt.Rows.Add(dr);
});
return dt;
}

Categories

Resources