I am working on a section of a project that uses large number of sum methods. These sum methods are applied on a Datatable
To test the best method, I use the following
Datatable structure
class LogParser
{
public DataTable PGLStat_Table = new DataTable();
public LogParser()
{
PGLStat_Table.Columns.Add("type", typeof(string));
PGLStat_Table.Columns.Add("desc", typeof(string));
PGLStat_Table.Columns.Add("count", typeof(int));
PGLStat_Table.Columns.Add("duration", typeof(decimal));
PGLStat_Table.Columns.Add("cper", typeof(decimal));
PGLStat_Table.Columns.Add("dper", typeof(decimal));
PGLStat_Table.Columns.Add("occurancedata", typeof(string));
}
}
Following method is used to Fill the table
LogParser pglp = new LogParser();
Random r2 = new Random();
for (int i = 1; i < 1000000; i++)
{
int c2 = r2.Next(1, 1000);
pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , c2, 0, 0, 0, " ");
}
Sum is applied on count column, where value of c2 is updated
Following Methods used to calculate Sum
Method 1 using Compute
Stopwatch s2 = new Stopwatch();
s2.Start();
object sumObject;
sumObject = pglp.PGLStat_Table.Compute("Sum(count)", " ");
s2.Stop();
long d1 = s2.ElapsedMilliseconds;
Method 2 using Foreach loop
s2.Restart();
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
int c = Convert.ToInt32(dr["count"].ToString());
totalcount = totalcount + c;
}
s2.Stop();
long d2 = s2.ElapsedMilliseconds;
Method 3 using Linq
s2.Restart();
var sum = pglp.PGLStat_Table.AsEnumerable().Sum(x => x.Field<int>("count"));
MessageBox.Show(sum.ToString());
s2.Stop();
long d3 = s2.ElapsedMilliseconds;
After Comparison the results are
a) foreach is the fastest 481ms
b) next is linq 1016ms
c) and then Compute 2253ms
Query 1
I accidentally change "c2 to i" in the following statement
pglp.PGLStat_Table.Rows.Add("Type" + i.ToString(), "desc" + i , i, 0, 0, 0, " ");
The Linq statement produces an error
Arithmetic operation resulted in an overflow.
Whereas the Compute and Foreach loop are still able to complete the computation although maybe incorrect.
Is such a behaviour cause of concern or am I missing a directive ?
(also the figures computed are large)
Query 2
I was under the impression Linq does it fastest, is there a optimized method or parameter
that makes it perform better.
thanks for advice
arvind
Fastest sum is next (with precompute DataColumn and direct cast to int):
static int Sum(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
totalcount += (int)dr[column];
}
return totalcount;
}
Statistic:
00:00:00.1442297, for/each, by column, (int)
00:00:00.1595430, for/each, by column, Field<int>
00:00:00.6961964, for/each, by name, Convert.ToInt
00:00:00.1959104, linq, cast<DataRow>, by column, (int)
Other code:
static int Sum_ForEach_ByColumn_Field(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
totalcount += dr.Field<int>(column);
}
return totalcount;
}
static int Sum_ForEach_ByName_Convert(LogParser pglp)
{
int totalcount = 0;
foreach (DataRow dr in pglp.PGLStat_Table.Rows)
{
int c = Convert.ToInt32(dr["count"].ToString());
totalcount = totalcount + c;
}
return totalcount;
}
static int Sum_Linq(LogParser pglp)
{
var column = pglp.PGLStat_Table.Columns["count"];
return pglp.PGLStat_Table.Rows.Cast<DataRow>().Sum(row => (int)row[column]);
}
var data = GenerateData();
Sum(data);
Sum_Linq2(data);
var count = 3;
foreach (var info in new[]
{
new {Name = "for/each, by column, (int)", Method = (Func<LogParser, int>)Sum},
new {Name = "for/each, by column, Field<int>", Method = (Func<LogParser, int>)Sum_ForEach_ByColumn_Field},
new {Name = "for/each, by name, Convert.ToInt", Method = (Func<LogParser, int>)Sum_ForEach_ByName_Convert},
new {Name = "linq, cast<DataRow>, by column, (int)", Method = (Func<LogParser, int>)Sum_Linq},
})
{
var watch = new Stopwatch();
for (var i = 0; i < count; ++i)
{
watch.Start();
var sum = info.Method(data);
watch.Stop();
}
Console.WriteLine("{0}, {1}", TimeSpan.FromTicks(watch.Elapsed.Ticks / count), info.Name);
}
well you could improve a bit on the linq example (AsEnumerable) but this is expected behavior - Linq(2objects) cannot be faster as a loop (you could do even better by using a for(var i = ...) loop instead of the foreach) - I guess what you meant to do was using Linq2Sql - then the aggregation (sum) will be done on the database and it should be faster - but as you don't seem to use database-data...
Query 1.
As you can see in documentation Enumerable.Sum extension method throws an OverflowException on integer overflow. DataTable.Compute has no such a functionality as well as integer operations you use in Method 2.
UPDATE:
Query 2.
I was under the impression Linq does it fastest, is there a optimized method or parameter that makes it perform better.
AFAIK, there is no method's to optimize array summation algorithm (without using parallel computing). Linq doubles the time used by foreach. So, I don't think that's about linq performance but compute inefficiency (note that there is an overhead for query string interpretation).
Related
I have a datatable like that:
column1 column2 column3
a b c
d e f
I want to get index numbers of the cell "e" and i wrote these
int[] indexrowcol = new int[2];
for (int i = 0; i < dt.Columns.Count ; i++)
{
for (int j = 0; j < dt.Rows.Count; j++)
{
if (dt.Rows[i][j] == "e")
{
indexrowcol[0] = j; indexrowcol[1] = i;
}
}
}
How to write the same thing with usin LINQ? thanks.
I don't believe you have your original code implemented correctly to get what you're after. But at least it's more or less clear what you're trying to do. Here's some commented link code that can accomplish it.
var valToSearch = "e";
int[] indexrowcol = dt.AsEnumerable() // allows you to use linq
.SelectMany((row,rix) => // like 'Select', but stacks up listed output
row.ItemArray.Select( // ItemArray gets the row as an array
(col,cix) => new { rix, cix, value = col.ToString() }
)
)
.Where(obj => obj.value == valToSearch)
.Select(obj => new int[] { obj.rix, obj.cix })
.FirstOrDefault();
When I use the above code on the following DataTable, I get the result [1,1], which is the same result I get using your original code when I correct for the i/j reversal that existed at the time of this writing.
var dt = new DataTable();
dt.Columns.Add("Column1");
dt.Columns.Add("Column2");
dt.Columns.Add("Column3");
DataRow rw = dt.NewRow();
rw["Column1"] = "a";
rw["Column2"] = "b";
rw["Column3"] = "c";
dt.Rows.Add(rw);
rw = dt.NewRow();
rw["Column1"] = "d";
rw["Column2"] = "e";
rw["Column3"] = "f";
dt.Rows.Add(rw);
The reason your original code isn't quite right is that you use 'i' for columns and 'j' for rows, but then call dt.Rows[i][j], which is backwards. I highly recommend that your variables can be matched to what they are associated with. This is why I use names such as col, row, cix (column index), and rix to keep things straight.
In that vein, you might want to also output something other than an int[2]. Maybe a class or struct, or even just leave it as an anonymous object (get rid of the 'select' part of my query). Though I don't know your end use case, so I'll leave you alone on that.
I have a implememtation where i need to loop through a collection of documents and based on certain condition merge the documents .
The merge condition is very simple, if present document's doctype is same as later document's doctype, then copy all the pages from the later doctype and append it to the pages of present document's and remove the later document from the collection.
Note : Both response.documents and response.documents[].pages are List<> collections.
I was trying this but was getting following exception Once I remove the document.
collection was modified enumeration may not execute
Here is the code:
int docindex = 0;
foreach( var document in response.documents)
{
string presentDoctype = string.Empty;
string laterDoctype = string.Empty;
presentDoctype = response.documents[docindex].doctype;
laterDoctype = response.documents[docindex + 1].doctype;
if (laterDoctype == presentDoctype)
{
response.documents[docindex].pages.AddRange(response.documents[docindex + 1].pages);
response.documents.RemoveAt(docindex + 1);
}
docindex = docindex + 1;
}
Ex:
reponse.documents[0].doctype = "BankStatement" //page count = 1
reponse.documents[1].doctype = "BankStatement" //page count = 2
reponse.documents[2].doctype = "BankStatement" //page count = 2
reponse.documents[3].doctype = "BankStatement" //page count = 1
reponse.documents[4].doctype = "BankStatement" //page count = 4
Expected result:
response.documents[0].doctype = "BankStatement" //page count = 10
Please suggest.Appreciate your help.
I would recommend you to look at LINQ GroupBy and Distinct to process your response.documents
Example (as I cannot use your class, I give example using my own defined class):
Suppose you have DummyClass
public class DummyClass {
public int DummyInt;
public string DummyString;
public double DummyDouble;
public DummyClass() {
}
public DummyClass(int dummyInt, string dummyString, double dummyDouble) {
DummyInt = dummyInt;
DummyString = dummyString;
DummyDouble = dummyDouble;
}
}
Then doing GroupBy as shown,
DummyClass dc1 = new DummyClass(1, "This dummy", 2.0);
DummyClass dc2 = new DummyClass(2, "That dummy", 2.0);
DummyClass dc3 = new DummyClass(1, "These dummies", 2.0);
DummyClass dc4 = new DummyClass(2, "Those dummies", 2.0);
DummyClass dc5 = new DummyClass(3, "The dummies", 2.0);
List<DummyClass> dummyList = new List<DummyClass>() { dc1, dc2, dc3, dc4, dc5 };
var groupedDummy = dummyList.GroupBy(x => x.DummyInt).ToList();
Will create three groups, marked by DummyInt
Then to process the group you could do
for (int i = 0; i < groupedDummy.Count; ++i){
foreach (DummyClass dummy in groupedDummy[i]) { //this will process the (i-1)-th group
//do something on this group
//groupedDummy[0] will consists of "this" and "these", [1] "that" and "those", while [2] "the"
//Try it out!
}
}
In your case, you should create group based on doctype.
Once you create groups based on your doctype, everything else would be pretty "natural" for you to continue.
Another LINQ method which you might be interested in would be Distinct. But I think for this case, GroupBy would be the primary method you would like to use.
Use only "for loop" instead of "foreach".
foreach will hold the collection and cannot be modified while looping thru it.
Here is an example using groupBy, hope this help.
//mock a collection
ICollection<string> collection1 = new List<string>();
for (int i = 0; i < 10; i++)
{
collection1.Add("BankStatement");
}
for (int i = 0; i < 5; i++)
{
collection1.Add("BankStatement2");
}
for (int i = 0; i < 4; i++)
{
collection1.Add("BankStatement3");
}
//merge and get count
var result = collection1.GroupBy(c => c).Select(c => new { name = c.First(), count = c.Count().ToString() }).ToList();
foreach (var item in result)
{
Console.WriteLine(item.name + ": " + item.count);
}
Just use AddRange()
response.documents[0].pages.AddRange(response.documents[1].pages);
it will merge all pages of document[1] with the document[0] into document[0]
I'm coding in c# on webpages/razor with MS SQL database
I have a table with the following columns
Sat1
Sat2
Sat3
Sat4
...
Sat25
I want to loop through each of these, and assign the value to satAvail
I have the following
for (var i = 1; i < 26; i++)
{
satWeek = "Sat" + i;
satAvail = item.satWeek;
}
I want the equivalent of satAvail = item.Sat1;
I've tried a few different lines but having no joy
use reflection
var value = item.GetType().GetProperty("Sat" + i).GetValue(item, null);
and if you want a sum (assuming Sat1... Sat2 are integers)
var sum = 0;
for (var i = 1; i < 26; i++) {
sum +=(int)item.GetType().GetProperty("Sat" + i).GetValue(item, null);
}
satAvail = sum;
or linq way :
var sum = Enumerable.Range(1, 25)
.Select(x => (int)item.GetType().GetProperty("Sat" + x).GetValue(item, null))
.Sum();
It's not clear if you're using an ORM or ADO, but assuming ADO, you could use something like:
DataTable dt = new DataTable();
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn column in dt.Columns)
{
var satAvail = row[column];
}
}
I'm not sure I'm clear on your actual requirement, but in general, when working with the Database helper, if you want to access a column value resulting from a Database.Query or Database.QuerySingle call, you can either do it using dot notation or an indexer.
For example, you may get data doing this:
var db = Database.Open("MyDatabase");
var item = db.QuerySingle("SELECT * FROM Mytable WHERE ID = 1");
If you know want to access the value of a column called Sat1, you would use item.Sat1. However, if the column name is represented as a variable, you would need to use an indexer instead:
var satWeek = "Sat" + "1";
var satAvail = item[satWeek];
I have a list of DataTables like
List<DataTable> a = new List<DataTable>();
I want to make a deep copy of this list (i.e. copying each DataTable). My code currently looks like
List<DataTable> aCopy = new List<DataTable>();
for(int i = 0; i < a.Rows.Count; i++) {
aCopy.Add(a[i].Copy());
}
The performance is absolutely terrible, and I am wondering if there is a known way to speed up such a copy?
Edit: do not worry about why I have this or need to do this, just accept that it is part of a legacy code base that I cannot change
if you have to copy a data table it is essentially an N time operation. If the data table is very large and causing a large amount of allocation you may be able to speed up the operation by doing a section at a time, but you are essentially bounded by the work set.
You can try the following - it gave me a performance boost, although your mileage might vary! I've adapted it to your example to demonstrate how to copy a datatable using an alternative mechanism - clone the table, then stream the data in. You could easily put this in an extension method.
List<DataTable> aCopy = new List<DataTable>();
for(int i = 0; i < a.Rows.Count; i++) {
DataTable sourceTable = a[i];
DataTable copyTable = sourceTable.Clone(); //Clones structure
copyTable.Load(sourceTable.CreateDataReader());
}
This was many times faster (around 6 in my use case) than the following:
DataTable copyTable = sourceTable.Clone();
foreach(DataRow dr in sourceTable.Rows)
{
copyTable.ImportRow(dr);
}
Also, If we look at what DataTable.Copy is doing using ILSpy:
public DataTable Copy()
{
IntPtr intPtr;
Bid.ScopeEnter(out intPtr, "<ds.DataTable.Copy|API> %d#\n", this.ObjectID);
DataTable result;
try
{
DataTable dataTable = this.Clone();
foreach (DataRow row in this.Rows)
{
this.CopyRow(dataTable, row);
}
result = dataTable;
}
finally
{
Bid.ScopeLeave(ref intPtr);
}
return result;
}
internal void CopyRow(DataTable table, DataRow row)
{
int num = -1;
int newRecord = -1;
if (row == null)
{
return;
}
if (row.oldRecord != -1)
{
num = table.recordManager.ImportRecord(row.Table, row.oldRecord);
}
if (row.newRecord != -1)
{
if (row.newRecord != row.oldRecord)
{
newRecord = table.recordManager.ImportRecord(row.Table, row.newRecord);
}
else
{
newRecord = num;
}
}
DataRow dataRow = table.AddRecords(num, newRecord);
if (row.HasErrors)
{
dataRow.RowError = row.RowError;
DataColumn[] columnsInError = row.GetColumnsInError();
for (int i = 0; i < columnsInError.Length; i++)
{
DataColumn column = dataRow.Table.Columns[columnsInError[i].ColumnName];
dataRow.SetColumnError(column, row.GetColumnError(columnsInError[i]));
}
}
}
It's not surprising that the operation will take a long time; not only is it row by row, but it also does additional validation.
You should specify the capacity of the list otherwise it will have to grow internally to accommodate the data. See here for the detailed explanation.
List<DataTable> aCopy = new List<DataTable>(a.Count);
I found following approach much more efficient than other ways of filtering records like LINQ, provided your search criteria is simple:
public static DataTable FilterByEntityID(this DataTable table, int EntityID)
{
table.DefaultView.RowFilter = "EntityId = " + EntityID.ToString();
return table.DefaultView.ToTable();
}
Update: Here's a similar question
Suppose I have a DataTable with a few thousand DataRows in it.
I'd like to break up the table into chunks of smaller rows for processing.
I thought C#3's improved ability to work with data might help.
This is the skeleton I have so far:
DataTable Table = GetTonsOfData();
// Chunks should be any IEnumerable<Chunk> type
var Chunks = ChunkifyTableIntoSmallerChunksSomehow; // ** help here! **
foreach(var Chunk in Chunks)
{
// Chunk should be any IEnumerable<DataRow> type
ProcessChunk(Chunk);
}
Any suggestions on what should replace ChunkifyTableIntoSmallerChunksSomehow?
I'm really interested in how someone would do this with access C#3 tools. If attempting to apply these tools is inappropriate, please explain!
Update 3 (revised chunking as I really want tables, not ienumerables; going with an extension method--thanks Jacob):
Final implementation:
Extension method to handle the chunking:
public static class HarenExtensions
{
public static IEnumerable<DataTable> Chunkify(this DataTable table, int chunkSize)
{
for (int i = 0; i < table.Rows.Count; i += chunkSize)
{
DataTable Chunk = table.Clone();
foreach (DataRow Row in table.Select().Skip(i).Take(chunkSize))
{
Chunk.ImportRow(Row);
}
yield return Chunk;
}
}
}
Example consumer of that extension method, with sample output from an ad hoc test:
class Program
{
static void Main(string[] args)
{
DataTable Table = GetTonsOfData();
foreach (DataTable Chunk in Table.Chunkify(100))
{
Console.WriteLine("{0} - {1}", Chunk.Rows[0][0], Chunk.Rows[Chunk.Rows.Count - 1][0]);
}
Console.ReadLine();
}
static DataTable GetTonsOfData()
{
DataTable Table = new DataTable();
Table.Columns.Add(new DataColumn());
for (int i = 0; i < 1000; i++)
{
DataRow Row = Table.NewRow();
Row[0] = i;
Table.Rows.Add(Row);
}
return Table;
}
}
This is quite readable and only iterates through the sequence once, perhaps saving you the rather bad performance characteristics of repeated redundant Skip() / Take() calls:
public IEnumerable<IEnumerable<DataRow>> Chunkify(DataTable table, int size)
{
List<DataRow> chunk = new List<DataRow>(size);
foreach (var row in table.Rows)
{
chunk.Add(row);
if (chunk.Count == size)
{
yield return chunk;
chunk = new List<DataRow>(size);
}
}
if(chunk.Any()) yield return chunk;
}
This seems like an ideal use-case for Linq's Skip and Take methods, depending on what you want to achieve with the chunking. This is completely untested, never entered in an IDE code, but your method might look something like this.
private List<List<DataRow>> ChunkifyTable(DataTable table, int chunkSize)
{
List<List<DataRow>> chunks = new List<List<DataRow>>();
for (int i = 0; i < table.Rows.Count / chunkSize; i++)
{
chunks.Add(table.Rows.Skip(i * chunkSize).Take(chunkSize).ToList());
}
return chunks;
}
Here's an approach that might work:
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> InPages<T>(this IEnumerable<T> enumOfT, int pageSize)
{
if (null == enumOfT) throw new ArgumentNullException("enumOfT");
if (pageSize < 1) throw new ArgumentOutOfRangeException("pageSize");
var enumerator = enumOfT.GetEnumerator();
while (enumerator.MoveNext())
{
yield return InPagesInternal(enumerator, pageSize);
}
}
private static IEnumerable<T> InPagesInternal<T>(IEnumerator<T> enumeratorOfT, int pageSize)
{
var count = 0;
while (true)
{
yield return enumeratorOfT.Current;
if (++count >= pageSize) yield break;
if (false == enumeratorOfT.MoveNext()) yield break;
}
}
public static string Join<T>(this IEnumerable<T> enumOfT, object separator)
{
var sb = new StringBuilder();
if (enumOfT.Any())
{
sb.Append(enumOfT.First());
foreach (var item in enumOfT.Skip(1))
{
sb.Append(separator).Append(item);
}
}
return sb.ToString();
}
}
[TestFixture]
public class Tests
{
[Test]
public void Test()
{
// Arrange
var ints = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var expected = new[]
{
new[] { 1, 2, 3 },
new[] { 4, 5, 6 },
new[] { 7, 8, 9 },
new[] { 10 },
};
// Act
var pages = ints.InPages(3);
// Assert
var expectedString = (from x in expected select x.Join(",")).Join(" ; ");
var pagesString = (from x in pages select x.Join(",")).Join(" ; ");
Console.WriteLine("Expected : " + expectedString);
Console.WriteLine("Pages : " + pagesString);
Assert.That(pagesString, Is.EqualTo(expectedString));
}
}
Jacob wrote
This seems like an ideal use-case for
Linq's Skip and Take methods,
depending on what you want to achieve
with the chunking. This is completely
untested, never entered in an IDE
code, but your method might look
something like this.
private List<List<DataRow>> ChunkifyTable(DataTable table, int chunkSize)
{
List<List<DataRow>> chunks = new List<List<DaraRow>>();
for (int i = 0; i < table.Rows.Count / chunkSize; i++)
{
chunks.Add(table.Rows.Skip(i * chunkSize).Take(chunkSize).ToList());
}
return chunks;
}
Thanks for this Jacob - useful for me but I think the test in your example should be <= not <. If you use < and the number of rows is less than chunkSize the loop is never entered. Similarly the last partial chunk is not captured, only full chunks. As you've stated, the example is untested, etc so this is just an FYI in case someone else uses your code verbatim ;-)
Here is a completely differed approach. No memory is allocated for the chunks.
public static IEnumerable<IEnumerable<DataRow>> Chunkify(
this DataTable dataTable, int chunkSize)
{
for (int i = 0; i < dataTable.Rows.Count; i += chunkSize)
{
yield return GetChunk(i, Math.Min(i + chunkSize, dataTable.Rows.Count));
}
IEnumerable<DataRow> GetChunk(int from, int toExclusive)
{
for (int j = from; j < toExclusive; j++)
{
yield return dataTable.Rows[j];
}
}
}
Usage example:
var dataTable = GetTonsOfData();
foreach (var chunk in dataTable.Chunkify(1000))
{
Console.WriteLine($"Processing chunk of {chunk.Count()} rows");
foreach (var dataRow in chunk)
{
Console.WriteLine(dataRow[0]);
}
}
.NET (Core) 6 introduced the Chunk extension method that can be used to easily split a DataTable into batches:
IEnumerable<DataRow[]> chunks=myTable.AsEnumerable()
.Chunk(1000);
In earlier versions MoreLINQ's Batch extension method can be used to do the same:
IEnumerable<IEnumerable<DataRow>> chunks=myTable.AsEnumerable()
.Batch(1000);
Both can be used to split a DataTable into smaller onces. The following extension method does this, using a LoadRows helper to extract the row loading code:
public static IEnumerable<DataTable> Chunk(this DataTable source, int size)
{
ArgumentNullException.ThrowIfNull(source);
foreach (var chunk in source.AsEnumerable().Chunk(size))
{
var chunkTable = source.Clone();
chunkTable.MinimumCapacity = size;
chunkTable.LoadRows(chunk);
yield return chunkTable;
}
}
public static DataTable LoadRows(this DataTable table, IEnumerable<DataRow> rows)
{
ArgumentNullException.ThrowIfNull(table);
ArgumentNullException.ThrowIfNull(rows);
foreach (var row in rows)
{
table.ImportRow(row);
}
return table;
}
ArgumentNullException.ThrowIfNull(source); is another .NET Core addition that throws an ArgumentNullException using the parameter name if the parameter is null.
Finally chunkTable.MinimumCapacity = size; is used to reserve space for each table's rows, to avoid reallocations