How to query DataTable

How to query DataTable - c#

I have a datatable like that:
column1 column2 column3
a b c
d e f
I want to get index numbers of the cell "e" and i wrote these
int[] indexrowcol = new int[2];
for (int i = 0; i < dt.Columns.Count ; i++)
{
for (int j = 0; j < dt.Rows.Count; j++)
{
if (dt.Rows[i][j] == "e")
{
indexrowcol[0] = j; indexrowcol[1] = i;
}
}
}
How to write the same thing with usin LINQ? thanks.

I don't believe you have your original code implemented correctly to get what you're after. But at least it's more or less clear what you're trying to do. Here's some commented link code that can accomplish it.
var valToSearch = "e";
int[] indexrowcol = dt.AsEnumerable() // allows you to use linq
.SelectMany((row,rix) => // like 'Select', but stacks up listed output
row.ItemArray.Select( // ItemArray gets the row as an array
(col,cix) => new { rix, cix, value = col.ToString() }
)
)
.Where(obj => obj.value == valToSearch)
.Select(obj => new int[] { obj.rix, obj.cix })
.FirstOrDefault();
When I use the above code on the following DataTable, I get the result [1,1], which is the same result I get using your original code when I correct for the i/j reversal that existed at the time of this writing.
var dt = new DataTable();
dt.Columns.Add("Column1");
dt.Columns.Add("Column2");
dt.Columns.Add("Column3");
DataRow rw = dt.NewRow();
rw["Column1"] = "a";
rw["Column2"] = "b";
rw["Column3"] = "c";
dt.Rows.Add(rw);
rw = dt.NewRow();
rw["Column1"] = "d";
rw["Column2"] = "e";
rw["Column3"] = "f";
dt.Rows.Add(rw);
The reason your original code isn't quite right is that you use 'i' for columns and 'j' for rows, but then call dt.Rows[i][j], which is backwards. I highly recommend that your variables can be matched to what they are associated with. This is why I use names such as col, row, cix (column index), and rix to keep things straight.
In that vein, you might want to also output something other than an int[2]. Maybe a class or struct, or even just leave it as an anonymous object (get rid of the 'select' part of my query). Though I don't know your end use case, so I'll leave you alone on that.

Related

Shifting array and add duplicate of previous value

In my code I have string array of 1000 indexes and it contain unique string data. Now, I want to make duplicate of some of them string element without overwriting next element. To summarize I would like to shift the array and inserting a duplicate value.
Here my code is,
for (int r = 0; r < sResultarray1.Length; r++)
{
if (sResultarray1[r] != null &&
sResultarray1[r].Contains("WP") &&
sResultarray1[r].Contains("CB") == true)
{
index = Array.IndexOf(sResultarray1, sResultarray1[r]);
for (int e = 1000; e >= index + c; e--)
{
sResultarray1[e] = sResultarray1[e - c];
}
break;
}
}
My current output is
++ST110+WP001.CB001
++ST120+WP001.CB001
++ST120+WP002.CB001
++ST130+WP001.CB001
++ST110+WP001.CB001
++ST120+WP001.CB001
++ST120+WP002.CB001
++ST130+WP001.CB001
My desired output is
++ST110+WP001.CB001
++ST110+WP001.CB001
++ST120+WP001.CB001
++ST120+WP001.CB001
++ST120+WP002.CB001
++ST120+WP002.CB001
++ST130+WP001.CB001
++ST130+WP001.CB001
Does anyone help me out to solve this problem?

I suggest using different collection type - List<string> instead of String[] (at least temporarily): Add, Insert ("shift and add") are not operations array has been designed for. Something like this:
using System.Linq;
...
// Temporal collection - list
List<string> list = sResultarray1.ToList();
// Downward - we don't want to check the inserted item
for (int r = list.Count - 1; r >= 0; --r) {
if (list[r] != null && list[r].Contains("WP") && list[r].Contains("CB")) {
// if you want to insert - "shift and add duplicate value" - just insert
list.Insert(r + 1, list[r]);
}
}
// back to the array
sResultarray1 = list.ToArray();

C# DataTable column already exists issue

I'm attempting to import a CSV file into a DataTable, however the CSV contains headers that are the same. (For example, there are multiple "Date" headers for different form sections). To fix this, I decided to create a loop that will run through the headers and replace the duplicates or unwanted entries based on their position. I've replaced my replaceWith array with dummy entries, but my actual code does have the correct size to correlate with the replace array.
string[] columnNames = null;
string[] oStreamDataValues = null;
int[] error = {0,1,2,3,4,7,8,9,10,11,15,21,34,37,57,61,65,68,69,71,75,79,82,83,85,89,93,96,97,99,103,107,110,111,113,117,121,124,125,127,128,129,130,132,182,210,212,213,214,215,216,222,226,239};
int[] replace = {14,16,17,17,20,23,24,27,28,29,31,32,44,58,59,60,62,63,64,66,67,70,72,73,74,76,77,78,80,81,84,86,87,88,90,91,92,94,95,98,100,101,102,104,105,106,108,109,112,114,115,116,118,119,120,122,123,126,134,136,138,140,142,144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,180,184,186,187,188,190,191,192,194,195,196,198,199,200,202,203,204,206,207,208,209,236,242,243,244};
string[] replaceWith = {"Replace 1", "Replace 2", "Replace 3"};
string fix = "ignore_";
int inc = 00;
string entry = "";
while (!oStreamReader.EndOfStream)
{
string oStreamRowData = oStreamReader.ReadLine().Trim();
if (oStreamRowData.Length > 0)
{
//oStreamDataValues = Regex.Split(oStreamRowData, ",(?=(?:[^']*'[^']*')*[^']*$)");
oStreamDataValues = oStreamRowData.Split(',');
if (rowCount == 0)
{
rowCount = 1;
columnNames = oStreamDataValues;
for (int i = 0; i < columnNames.Length; i++)
{
for (int j = 0; j < error.Length; j++)
{
if (error[j] == i)
{
entry = fix + inc++;
}
}
for (int k = 0; k < replace.Length; k++)
{
if (replace[i] == i)
{
int add = 0;
entry = replaceWith[add++];
}
}
DataColumn oDataColumn = new DataColumn(entry, typeof(string));
oDataColumn.DefaultValue = string.Empty;
oDataTable.Columns.Add(oDataColumn);
}
}
}
I'm no coding expert, so my syntax/decision making isn't perfect.
However the error that I get is that A column named 'ignore_4' already belongs to this DataTable.
I assume something is incorrect in my loop logic.

I think you have overcomplicated the loops. You just need to keep an index of the current position in the array of errors and array of replaces.
string rep = "replace_"; // base string for replace fields
string fix = "ignore_"; // base string for ignore fields
// For demonstation purpose I have commented out this array. If you
// want every 'replace' column have its specific name then prepare this
// array with exactly the number of names required by the number of
// elements in the replace array
//
// string[] replaceWith = {"Replace 1", "Replace 2", "Replace 3"};
int idxErrors = 0; // Current position in the error array
int idxReplace = 0; // Current position in the replace array
int fixCounter = 1;
int repCounter = 1;
string entry = "";
for (int i = 0; i < columnNames.Length; i++)
{
// Is this the index of a column that should be ignored?
if (idxErrors < error.Length && i == error[idxErrors])
{
entry = fix + fixCounter.ToString("D2");
idxErrors++;
fixCounter++;
}
// Is this the index of a column that should have a different name??
else if (idxReplace < replace.Length && i == replace[idxReplace])
{
entry = rep + repCounter.ToString("D2");
// entry = replaceWith[repCounter];
idxReplace++;
repCounter++;
}
else
entry = columnNames[i];
// Now create the column
DataColumn oDataColumn = new DataColumn(entry, typeof(string));
oDataColumn.DefaultValue = string.Empty;
oDataTable.Columns.Add(oDataColumn);
}
In this example I have used the same pattern used for the ignored column also for the columns that need to have the name changed. If you want to give each renamed column a proper name, then you need to prepare an array with these proper names and this array should be of the same length of the replace array. Then use the idxReplace to take the correct name from the array of possible proper names.

for loop through ms sql columns

I'm coding in c# on webpages/razor with MS SQL database
I have a table with the following columns
Sat1
Sat2
Sat3
Sat4
...
Sat25
I want to loop through each of these, and assign the value to satAvail
I have the following
for (var i = 1; i < 26; i++)
{
satWeek = "Sat" + i;
satAvail = item.satWeek;
}
I want the equivalent of satAvail = item.Sat1;
I've tried a few different lines but having no joy

use reflection
var value = item.GetType().GetProperty("Sat" + i).GetValue(item, null);
and if you want a sum (assuming Sat1... Sat2 are integers)
var sum = 0;
for (var i = 1; i < 26; i++) {
sum +=(int)item.GetType().GetProperty("Sat" + i).GetValue(item, null);
}
satAvail = sum;
or linq way :
var sum = Enumerable.Range(1, 25)
.Select(x => (int)item.GetType().GetProperty("Sat" + x).GetValue(item, null))
.Sum();

It's not clear if you're using an ORM or ADO, but assuming ADO, you could use something like:
DataTable dt = new DataTable();
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn column in dt.Columns)
{
var satAvail = row[column];
}
}

I'm not sure I'm clear on your actual requirement, but in general, when working with the Database helper, if you want to access a column value resulting from a Database.Query or Database.QuerySingle call, you can either do it using dot notation or an indexer.
For example, you may get data doing this:
var db = Database.Open("MyDatabase");
var item = db.QuerySingle("SELECT * FROM Mytable WHERE ID = 1");
If you know want to access the value of a column called Sat1, you would use item.Sat1. However, if the column name is represented as a variable, you would need to use an indexer instead:
var satWeek = "Sat" + "1";
var satAvail = item[satWeek];

Comparing Sum Methods in C#

I am working on a section of a project that uses large number of sum methods. These sum methods are applied on a Datatable
To test the best method, I use the following
Datatable structure
class LogParser
{
public DataTable PGLStat_Table = new DataTable();
public LogParser()
{
PGLStat_Table.Columns.Add("type", typeof(string));
PGLStat_Table.Columns.Add("desc", typeof(string));
PGLStat_Table.Columns.Add("count", typeof(int));
PGLStat_Table.Columns.Add("duration", typeof(decimal));
PGLStat_Table.Columns.Add("cper", typeof(decimal));
PGLStat_Table.Columns.Add("dper", typeof(decimal));
PGLStat_Table.Columns.Add("occurancedata", typeof(string));
}
}
Following method is used to Fill the table
LogParser pglp = new LogParser();
Random r2 = new Random();
for (int i = 1; i < 1000000; i++)
{
int c2 = r2.Next(1, 1000);
pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , c2, 0, 0, 0, " ");
}
Sum is applied on count column, where value of c2 is updated
Following Methods used to calculate Sum
Method 1 using Compute
Stopwatch s2 = new Stopwatch();
s2.Start();
object sumObject;
sumObject = pglp.PGLStat_Table.Compute("Sum(count)", " ");
s2.Stop();
long d1 = s2.ElapsedMilliseconds;
Method 2 using Foreach loop
s2.Restart();
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
int c = Convert.ToInt32(dr["count"].ToString());
totalcount = totalcount + c;
}
s2.Stop();
long d2 = s2.ElapsedMilliseconds;
Method 3 using Linq
s2.Restart();
var sum = pglp.PGLStat_Table.AsEnumerable().Sum(x => x.Field<int>("count"));
MessageBox.Show(sum.ToString());
s2.Stop();
long d3 = s2.ElapsedMilliseconds;
After Comparison the results are
a) foreach is the fastest 481ms
b) next is linq 1016ms
c) and then Compute 2253ms
Query 1
I accidentally change "c2 to i" in the following statement
pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , i, 0, 0, 0, " ");
The Linq statement produces an error
Arithmetic operation resulted in an overflow.
Whereas the Compute and Foreach loop are still able to complete the computation although maybe incorrect.
Is such a behaviour cause of concern or am I missing a directive ?
(also the figures computed are large)
Query 2
I was under the impression Linq does it fastest, is there a optimized method or parameter
that makes it perform better.
thanks for advice
arvind

Fastest sum is next (with precompute DataColumn and direct cast to int):
static int Sum(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
totalcount += (int)dr[column];
}
return totalcount;
}
Statistic:
00:00:00.1442297, for/each, by column, (int)
00:00:00.1595430, for/each, by column, Field<int>
00:00:00.6961964, for/each, by name, Convert.ToInt
00:00:00.1959104, linq, cast<DataRow>, by column, (int)
Other code:
static int Sum_ForEach_ByColumn_Field(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
totalcount += dr.Field<int>(column);
}
return totalcount;
}
static int Sum_ForEach_ByName_Convert(LogParser pglp)
{
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
int c = Convert.ToInt32(dr["count"].ToString());
totalcount = totalcount + c;
}
return totalcount;
}
static int Sum_Linq(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
return pglp.PGLStat_Table.Rows.Cast<DataRow>().Sum(row => (int)row[column]);
}
var data = GenerateData();
Sum(data);
Sum_Linq2(data);
var count = 3;
foreach (var info in new[]
{
new {Name = "for/each, by column, (int)", Method = (Func<LogParser, int>)Sum},
new {Name = "for/each, by column, Field<int>", Method = (Func<LogParser, int>)Sum_ForEach_ByColumn_Field},
new {Name = "for/each, by name, Convert.ToInt", Method = (Func<LogParser, int>)Sum_ForEach_ByName_Convert},
new {Name = "linq, cast<DataRow>, by column, (int)", Method = (Func<LogParser, int>)Sum_Linq},
})
{
var watch = new Stopwatch();
for (var i = 0; i < count; ++i)
{
watch.Start();
var sum = info.Method(data);
watch.Stop();
}
Console.WriteLine("{0}, {1}", TimeSpan.FromTicks(watch.Elapsed.Ticks / count), info.Name);
}

well you could improve a bit on the linq example (AsEnumerable) but this is expected behavior - Linq(2objects) cannot be faster as a loop (you could do even better by using a for(var i = ...) loop instead of the foreach) - I guess what you meant to do was using Linq2Sql - then the aggregation (sum) will be done on the database and it should be faster - but as you don't seem to use database-data...

Query 1.
As you can see in documentation Enumerable.Sum extension method throws an OverflowException on integer overflow. DataTable.Compute has no such a functionality as well as integer operations you use in Method 2.
UPDATE:
Query 2.
I was under the impression Linq does it fastest, is there a optimized method or parameter that makes it perform better.
AFAIK, there is no method's to optimize array summation algorithm (without using parallel computing). Linq doubles the time used by foreach. So, I don't think that's about linq performance but compute inefficiency (note that there is an overhead for query string interpretation).

How to specify format for individual cells with Excel.Range.set_Value()

When I write a whole table into an excel worksheet, I know to work with a whole Range at once instead of writing to individual cells. However, is there a way to specify format as I'm populating the array I'm going to export to Excel?
Here's what I do now:
object MissingValue = System.Reflection.Missing.Value;
Excel.Application excel = new Excel.Application();
int rows = 5;
int cols = 5;
int someVal;
Excel.Worksheet sheet = (Excel.Worksheet)excel.Workbooks.Add(MissingValue).Sheets[1];
Excel.Range range = sheet.Range("A1", sheet.Cells(rows,cols));
object[,] rangeData = new object[rows,cols];
for(int r = 0; r < rows; r++)
{
for(int c = 0; c < cols; c++)
{
someVal = r + c;
rangeData[r,c] = someVal.ToString();
}
}
range.set_Value(MissingValue, rangeData);
Now suppose that I want some of those numbers to be formatted as percentages. I know I can go back on a cell-by-cell basis and change the formatting, but that seems to defeat the whole purpose of using a single Range.set_Value() call. Can I make my rangeData[,] structure include formatting information, so that when I call set_Value(), the cells are formatted in the way I want them?
To clarify, I know I can set the format for the entire Excel.Range object. What I want is to have a different format specified for each cell, specified in the inner loop.

So here's the best "solution" I've found so far. It isn't the nirvanna I was looking for, but it's much, much faster than setting the format for each cell individually.
// 0-based indexes
static string RcToA1(int row, int col)
{
string toRet = "";
int mag = 0;
while(col >= Math.Pow(26, mag+1)){mag++;}
while (mag>0)
{
toRet += System.Convert.ToChar(64 + (byte)Math.Truncate((double)(col/(Math.Pow(26,mag)))));
col -= (int)Math.Truncate((double)Math.Pow(26, mag--));
}
toRet += System.Convert.ToChar(65 + col);
return toRet + (row + 1).ToString();
}
static Random rand = new Random(DateTime.Now.Millisecond);
static string RandomExcelFormat()
{
switch ((int)Math.Round(rand.NextDouble(),0))
{
case 0: return "0.00%";
default: return "0.00";
}
}
struct ExcelFormatSpecifier
{
public object NumberFormat;
public string RangeAddress;
}
static void DoWork()
{
List<ExcelFormatSpecifier> NumberFormatList = new List<ExcelFormatSpecifier>(0);
object[,] rangeData = new object[rows,cols];
for(int r = 0; r < rows; r++)
{
for(int c = 0; c < cols; c++)
{
someVal = r + c;
rangeData[r,c] = someVal.ToString();
NumberFormatList.Add(new ExcelFormatSpecifier
{
NumberFormat = RandomExcelFormat(),
RangeAddress = RcToA1(rowIndex, colIndex)
});
}
}
range.set_Value(MissingValue, rangeData);
int max_format = 50;
foreach (string formatSpecifier in NumberFormatList.Select(p => p.NumberFormat).Distinct())
{
List<string> addresses = NumberFormatList.Where(p => p.NumberFormat == formatSpecifier).Select(p => p.RangeAddress).ToList();
while (addresses.Count > 0)
{
string addressSpecifier = string.Join(",", addresses.Take(max_format).ToArray());
range.get_Range(addressSpecifier, MissingValue).NumberFormat = formatSpecifier;
addresses = addresses.Skip(max_format).ToList();
}
}
}
Basically what is happening is that I keep a list of the format information for each cell in NumberFormatList (each element also holds the A1-style address of the range it applies to). The original idea was that for each distinct format in the worksheet, I should be able to construct an Excel.Range of just those cells and apply the format to that range in a single call. This would reduce the number of accesses to NumberFormat from (potentially) thousands down to just a few (however many different formats you have).
I ran into an issue, however, because you apparently can't construct a range from an arbitrarily long list of cells. After some testing, I found that the limit is somewhere between 50 and 100 cells that can be used to define an arbitrary range (as in range.get_Range("A1,B1,C1,A2,AA5,....."). So once I've gotten the list of all cells to apply a format to, I have one final while() loop that applies the format to 50 of those cells at a time.
This isn't ideal, but it still reduces the number of accesses to NumberFormat by a factor of up to 50, which is significant. Constructing my spreadsheet without any format info (only using range.set_Value()) takes about 3 seconds. When I apply the formats 50 cells at a time, that is lengthened to about 10 seconds. When I apply the format info individually to each cell, the spreadsheet takes over 2 minutes to finish being constructed!

You can apply a formatting on the range, and then populate it with values you cannot specify formatting in you object[,] array

You apply the formatting to each individual cell within the inner loop via
for(int r = 0; r < rows; r++)
{
for(int c = 0; c < cols; c++)
{
Excel.Range r2 = sheet.Cells( r, c );
r2.xxxx = "";
}
}
Once you have r2, you can change the cell format any way you want.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to query DataTable - c#

Related

Shifting array and add duplicate of previous value

C# DataTable column already exists issue

for loop through ms sql columns

Comparing Sum Methods in C#

How to specify format for individual cells with Excel.Range.set_Value()

Categories

Resources