I am trying to delete rows from a datatable that have an empty or null cell, at the same time I check if a column has empty cells exceeding a percentage, if it's the case I drop the whole column. I tried proceeding like so:
private DataTable CleanData()
{
var dt = BindData(openFileDialog1.FileName);
for (var j = dt.Columns.Count-1; j >= 0; j--)
{
short count = 0;
for (var i = dt.Rows.Count - 1; i >= 0; i--)
{
if (!string.IsNullOrEmpty(dt.Rows[i][j].ToString())) continue;
count++;
}
var percentage = count * 100.0 / dt.Rows.Count;
if (percentage > 10)
{
dt.Columns.RemoveAt(j);
textFile.Text += " " + j + " ";
}
}
dt.AcceptChanges();
for (var j = dt.Columns.Count - 1; j >= 0; j--)
for (var i = dt.Rows.Count - 1; i >= 0; i--)
{
if (!string.IsNullOrEmpty(dt.Rows[i][j].ToString())) continue;
dt.Rows[i].Delete();
}
dt.AcceptChanges();
return dt;
}
I loop a first time over the datatable cells, then check the percentage of empty cells in a column and if it exceeds 10% I delete that column, then I loop a second time and this time delete each row that has an empty cell, but on the second loop I get an error message (System.Data.DeletedRowInaccessibleException) when it reaches a deleted column index, even though it's supposed to loop on a datatable where those columns aren't there.
Any clue where I messed up ?
Edit: I made the changes proposed but still getting the same error
What I THINK you are running into is an unexpected side-effect of your loop checking % and deleting columns. You are starting with the 0-index column (1st column). Checking and then deleting if empty. Do it in reverse... start with the LAST column and work back to 0 and here is why.
Say you start with a table of 3 columns, so your loop counter is intended to to 0, 1, 2. First cycle through, loop counter 0. You determine data good, no delete. Counter = 1 (2nd column). Determine it needs to be removed due to % empty. Now you delete column[1]. This moves what WAS column[2] and now becomes column[1] and your counter now advances to 2. You never checked what WAS the third column.
If you did in reverse, you start at column[3], check it, find its ok (or not, dont care). Now down 1 to column[2] and determine to remove. So it gets deleted and column[3] is now column[2]. Now you check column[0] and finish no problem.
You are already doing this when checking the ROWS (starting at the end and working back). Same principle applies.
As for your loop on deleting the ROW, I would invert your loops.
Outer loop per ROW (last row first, working back)
{
Inner loop per COLUMN
{
if any single column qualifies to delete the row
{
dt.rows[i].Delete();
break; [break out of the column checking loop]
}
}
[ continue with each ROW]
}
Since your existing outer loop is per column, if you process column 1 and delete row 5, then get to column 2 and try to delete row 5 again, that is your failure.
By checking all columns for a single row FIRST and getting out as soon as one qualifies for deletion, you are done with that row and never need to consider looking at any other columns. Move to the next row for processing.
Related
I am trying to write a method that partitions a DataTable based on a given condition (delegate). My problem is that the condition I'm using always partitions exactly half the DataTable. The condition seems to resolve true for half the DataRows even when NO DataRows should resolve to true.
The method looks like this:
private DataTable PartitionDataTable(DataTable data, Func<DataRow, bool> condition) {
DataTable removedRows = data.Clone();
for(int i = 0; i < data.Rows.Count; i++) {
if(condition(data.Rows[i])){
removedRows.ImportRow(data.Rows[i]);
data.Rows.Remove(data.Rows[i]);
}
}
return removedRows;
}
I call this method using this condition:
DataTable removed = PartitionDataTable(data, (row => DateTimeOffset.Parse(row["timestamp"].ToString()) < baselineTimestamp);
If the highest/max timestamp in the data object (DataTable) is a few minutes earlier than the 'baselineTimestamp', determined using data.Compute("max([timestamp])", String.Empty), then half the records are still partitioned and removed when none of them should be because all of them are < baselineTimestamp.
No idea what's going on. Please help me. The goal is to partition DataRows with timestamps earlier than a given (to the nearest milisecond).
You are removing rows as you iterate over the dataset. So if i=2, then you remove row 2, and row 3 is now row 2. You then increment i, operating on the new row 3 (which was row 4) so you skip the original row 3 altogether.
One trick to resolve this is to iterate backwards since the rows that are shifted are ones that you've already processed:
for(int i = data.Rows.Count-1; i >= 0; i--) {
if(condition(data.Rows[i])){
removedRows.ImportRow(data.Rows[i]);
data.Rows.Remove(data.Rows[i]);
}
}
return removedRows;
}
Every time you remove a row from data it's Rows.Count decreases.
Suppose data contains 10 rows at the start.
You increment i to 5 while you remove 5 rows from data.
On next iteration, i is 6, and data.Rows.Count is 5, the loop terminates.
Since you are removing elements from an array, you have to move backward. If you do it forward, you'll skip half of the elements, and this is why you get half of them back:
DataTable removedRows = data.Clone();
for(int i = data.Rows.Count-1; i >= 0 ; i--) {
if(condition(data.Rows[i])){
removedRows.ImportRow(data.Rows[i]);
data.Rows.Remove(data.Rows[i]);
}
}
return removedRows;
}
I am trying to export data from c# to excel using the following code:
enter worksheet = workbook.ActiveSheet;
worksheet.Name = "ExportedFromDatGrid";
//Loop through each row and read value from each column.
for (int i = 0; i < dataGridView1.Rows.Count + 1; i++)
{
worksheet.Cells[1, i] = dataGridView1.Columns[i - 1].HeaderText;
}
for (int i = 0; i < dataGridView1.Columns.Count; i++)
{
for (int j = 0; j < dataGridView1.Columns.Count - 1; j++)
{
// Excel index starts from 1,1. As first Row would have the Column headers,
// adding a condition check.
worksheet.Cells[i + 2, j + 1] = dataGridView1.Rows[i].Cells[j].Value.ToString();
}
}
I get the following error:
Index was out of range.Must be non negative and less than the size of the collection. Parameter name: index.
UPDATE I solved the problem by changing this for statement:
for ( int i = -1; i < DataGridView1.Columns.Count; i++)
I think the problem is that many online guides and tutorials exlpain that when you count through Lists<>, Arrays and rows/columns of a Table you need to add +1 because all these object containers have a start index of 0.
As a newcomer it might be hard to figure out at the beginning where you have to place the +1 and especially when you have to. Maybe you were confused because you wanted the total amount of rows as your max definition of i. But as you start your loop with int i = 0 (what is correct, because you dont want to skip the row with the index 0) you start as well at the point 0 and not 1. So there is no need to add +1 to the max breakpoint, because you still go dataGridView1.Rows.Count times (<-- amount how often your loop gets executed) through the rows.
This exception Index was out of range tells you that you wanted to do something with a row, which was out of range. It was out of range because this row's item didnt exist. Let´s say you have 10 rows with the index 0 - 9. Now you start going through them beginning at 0. So after 10 times executing, you went through rows 0 - 9. As dataGridView1.Rows.Count gives you the total amount of rows, in this example 10. But you set as the breakpoint dataGridView1.Rows.Count + 1 so the loop wants to do your task the 11th time with the row that has the index 10, but the index of your last row is 9. So it can't find this row and thats the situation when it gives you the Index out of range execption. Now I hope you understand what went wrong and why.
Try:
sheet.GetRow(rowNumber).CreateCell(columnNumber);
And then fill the cell value.
I'm trying to get the row count of rows which don't have any value (any of columns)
Sample image of the Excel file I'm using:
Highlighted rows have some values in some columns rest of rows are blank I need to count those rows.
I already used this method
int blankRows = 0;
double notEmpty = 1;
while (notEmpty > 0)
{
string aCellAddress = "A" + (rowIndex++).ToString();
Excel.Range row = excelApp.get_Range(aCellAddress, aCellAddress).EntireRow;
notEmpty = excelApp.WorksheetFunction.CountA(row);
if (notEmpty <= 0)
{
blankRows++;
}
}
but this is very time consuming process when file is large and minimum number of blank rows is there.
One thing that might help would be to find the last column that has data and last row that has data as to limit your search.
This is VBA code snippet, but could be easily transformed to C#:
'iterate through columns to determine which is longest to determine the highest row number.
For i = 1 To 16384 'number of columns in excel
'get the row
rowcount = ws.Cells(Rows.Count, i).End(xlUp).Row
'check to see if it's larger than what it is now, if it is, set the value of lRow.
If rowcount > lrow Then
lrow = rowcount
End If
Next
then use a similar loop to get the last row based on the last row, stepping through each row until the last one to get the last column with data.
You can use those values to limit the range that you're looking through. I'm not sure if it will be any faster, but it might help.
Trying to iterate through the rows and cells on an excel spreadsheet, deleting empty ones. I'm using the following routine to do so.
foreach(Range row in sheet.UsedRange.Rows)
{
for (int i = 0; i < row.Columns.Count; i++)
{
Range cell = row.Cells[1, i + 1];
if (cell.Value == null || String.IsNullOrEmpty(cell.Value.ToString()))
{
cell.Delete();
}
}
}
Which works fine for the first two rows. However, it then seems to go haywire.
The third row is completely empty. Yet as it iterates through the columns, when this loop gets to column "I", it reads a value there. The value is what's actually in row 4, column "J".
After that, it just gets worse, missing whole rows and reading incorrect values from the rows it does find.
I am baffled by this. Is there something obvious that I have missed?
Yes, you are missing something very obvious. You are deleting cells. After that operation, your calculation of which cell to pick doesn't work any more.
If you delete a cell, all other cells will move up. That causes your row.Cells[1, i + 1] to be incorrect. If you for example delete one cell in row 2, the value of the cell in the same column in row 3 will never get checked, since it is in row 2 then.
The direction of shift on deletion may also be a factor - you can control it by passing a parameter to the Delete function.
Simply recheck the same column when you delete one:
foreach (Range row in Globals.ThisAddIn.Application.ActiveWorkbook.ActiveSheet.UsedRange.Rows)
{
for (int i = 0; i < row.Columns.Count; i++)
{
Range cel = row.Cells[1, i + 1];
if (cel.Value == null || String.IsNullOrEmpty(cel.Value.ToString()))
{
// default shift is up
cel.Delete();
// to shift left use cel.Delete(XlDeleteShiftDirection.xlShiftToLeft);
i--; // this will do
}
}
}
I have a problem where I can either update by one row (and that's it) or by four at once.
The issue is with tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1) on the last two lines, comments are indicating what happens when the statement is placed there.
There will be a limit of 6, but I can't figure out why I can't update one row at a time more than once with the code I've got.
The for loops are only allowing four cells per row and no more than 4 rows (not including the initial row at start).
Can you point me in the right direction please?
C#:
public void addRows_Click1(object sender, EventArgs e)
{
rmvRows.Visible = true;
// rows
for (int rowCount = 0; rowCount < 4; rowCount++ )
{
tr1 = new TableRow();
// cells
for (int cellCount = 0; cellCount < 4; cellCount++)
{
tc1 = new TableCell();
tb1 = new TextBox();
tb1.ID = "tbID" + cellCount.ToString();
tc1.Controls.Add(tb1);
tr1.Cells.Add(tc1);
}
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1);// will add four more rows
}
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1)// adds one one but no more
}
This line:
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1);// will add four more rows
is within a for loop that is running 4 times (from 0 to 3):
for (int rowCount = 0; rowCount < 4; rowCount++ )
{
tr1 = new TableRow();
/* code omitted */
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1);// will add four more rows
}
The line itself only adds one row each time it is called, but it is getting called 4 times. Therefore, the table has 4 additional rows after the for loop's execution.
As for your second line:
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1)// adds one one but no more
This is outside of any loop, so, like any other normal code, it only executes once. Since it only executes once, it only adds one row.
Additionally, since tr1 is not changed after the for loop, the final row added by that last line is going to be a duplicate of whatever the last row created by the loop was.
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1)
Only adds one row regardless were it is called because tr1 is just one row.
The AddAt will only allow one row to be added at a curtain position.
The AddAt used in the loop should work just fine.
Hope that helps!