Trying to iterate through the rows and cells on an excel spreadsheet, deleting empty ones. I'm using the following routine to do so.
foreach(Range row in sheet.UsedRange.Rows)
{
for (int i = 0; i < row.Columns.Count; i++)
{
Range cell = row.Cells[1, i + 1];
if (cell.Value == null || String.IsNullOrEmpty(cell.Value.ToString()))
{
cell.Delete();
}
}
}
Which works fine for the first two rows. However, it then seems to go haywire.
The third row is completely empty. Yet as it iterates through the columns, when this loop gets to column "I", it reads a value there. The value is what's actually in row 4, column "J".
After that, it just gets worse, missing whole rows and reading incorrect values from the rows it does find.
I am baffled by this. Is there something obvious that I have missed?
Yes, you are missing something very obvious. You are deleting cells. After that operation, your calculation of which cell to pick doesn't work any more.
If you delete a cell, all other cells will move up. That causes your row.Cells[1, i + 1] to be incorrect. If you for example delete one cell in row 2, the value of the cell in the same column in row 3 will never get checked, since it is in row 2 then.
The direction of shift on deletion may also be a factor - you can control it by passing a parameter to the Delete function.
Simply recheck the same column when you delete one:
foreach (Range row in Globals.ThisAddIn.Application.ActiveWorkbook.ActiveSheet.UsedRange.Rows)
{
for (int i = 0; i < row.Columns.Count; i++)
{
Range cel = row.Cells[1, i + 1];
if (cel.Value == null || String.IsNullOrEmpty(cel.Value.ToString()))
{
// default shift is up
cel.Delete();
// to shift left use cel.Delete(XlDeleteShiftDirection.xlShiftToLeft);
i--; // this will do
}
}
}
Related
I am trying to delete rows from a datatable that have an empty or null cell, at the same time I check if a column has empty cells exceeding a percentage, if it's the case I drop the whole column. I tried proceeding like so:
private DataTable CleanData()
{
var dt = BindData(openFileDialog1.FileName);
for (var j = dt.Columns.Count-1; j >= 0; j--)
{
short count = 0;
for (var i = dt.Rows.Count - 1; i >= 0; i--)
{
if (!string.IsNullOrEmpty(dt.Rows[i][j].ToString())) continue;
count++;
}
var percentage = count * 100.0 / dt.Rows.Count;
if (percentage > 10)
{
dt.Columns.RemoveAt(j);
textFile.Text += " " + j + " ";
}
}
dt.AcceptChanges();
for (var j = dt.Columns.Count - 1; j >= 0; j--)
for (var i = dt.Rows.Count - 1; i >= 0; i--)
{
if (!string.IsNullOrEmpty(dt.Rows[i][j].ToString())) continue;
dt.Rows[i].Delete();
}
dt.AcceptChanges();
return dt;
}
I loop a first time over the datatable cells, then check the percentage of empty cells in a column and if it exceeds 10% I delete that column, then I loop a second time and this time delete each row that has an empty cell, but on the second loop I get an error message (System.Data.DeletedRowInaccessibleException) when it reaches a deleted column index, even though it's supposed to loop on a datatable where those columns aren't there.
Any clue where I messed up ?
Edit: I made the changes proposed but still getting the same error
What I THINK you are running into is an unexpected side-effect of your loop checking % and deleting columns. You are starting with the 0-index column (1st column). Checking and then deleting if empty. Do it in reverse... start with the LAST column and work back to 0 and here is why.
Say you start with a table of 3 columns, so your loop counter is intended to to 0, 1, 2. First cycle through, loop counter 0. You determine data good, no delete. Counter = 1 (2nd column). Determine it needs to be removed due to % empty. Now you delete column[1]. This moves what WAS column[2] and now becomes column[1] and your counter now advances to 2. You never checked what WAS the third column.
If you did in reverse, you start at column[3], check it, find its ok (or not, dont care). Now down 1 to column[2] and determine to remove. So it gets deleted and column[3] is now column[2]. Now you check column[0] and finish no problem.
You are already doing this when checking the ROWS (starting at the end and working back). Same principle applies.
As for your loop on deleting the ROW, I would invert your loops.
Outer loop per ROW (last row first, working back)
{
Inner loop per COLUMN
{
if any single column qualifies to delete the row
{
dt.rows[i].Delete();
break; [break out of the column checking loop]
}
}
[ continue with each ROW]
}
Since your existing outer loop is per column, if you process column 1 and delete row 5, then get to column 2 and try to delete row 5 again, that is your failure.
By checking all columns for a single row FIRST and getting out as soon as one qualifies for deletion, you are done with that row and never need to consider looking at any other columns. Move to the next row for processing.
I'm using visual studio 2017, C#, Windows Forms to create an index for words in a list of sentences.
I have two datagridview:
dataGridView2: This grid has a single column where each row contains a worded sentence.
dGvTopics: This grid has one column for every word that is repeated in the first sentence (first row) in dataGridView2, the column header text is the word.
Goal: I want to click button to categorize, inserting a row in dGvTopics for each row in dataGridView2 (sentences), place a copy of the sentence as the value for that column if the sentence contains the column header text.
My Code is:
private void btnClassify_Click(object sender, EventArgs e)
{
for (int i = 0; i < dGvTopics.Columns.Count; i++)
{
if (dataGridView2.Rows[i].Cells[0].Value.ToString().Contains(dGvTopics.Columns[i].HeaderText))
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[i].Cells[i].Value = dataGridView2.Rows[i].Cells[0].Value;
}
}
}
We can discuss later why you are doing this at all, there are easier ways :)
You need to understand that there are two dimensions to iterate here, the rows in dataGridView2 and the columns in dGvTopics, this means you will need two looping statements, not just one.
Your current code is looping through the Rows in dataGridView2 but only for the number of columns that are in dGvTopics which is a bit confusing.
PRO TIP: Don't use arbitrary single character variable names that have no meaning. Yes i is ubiquitously used to represent index in code you will find around the web, that doesn't mean it is good practice. i should be reserved for lazy programming where there is a single, single dimension array that you are iterating over, in your example there are 4 different levels of arrays that you accessing, the meaning of i is now ambiguous.
Instead of i, use a meaningful variable name like columnIndex or topicIndex. That way when each line is reviewed in isolation, the code is more self documenting. I would even accept t or c in this code, taking the first initial from the conceptual variable meaning will help spot common errors where the wrong indexer is used for the wrong array.
Yes this make the code wordy and long, but we're not constrained by memory space in the same way as our developer ancestors, this doesn't change the size of the final executable, strive to make your code self-documenting.
If you are programming in a code-memory-constrained environment, like for micro-controllers, or tiny chipsets, then still use meaningful short variables, not arbitrarily selected characters.
Applying the above recommendation highlights this first issue:
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex ++)
{
if (dataGridView2.Rows[columnIndex].Cells[0].Value.ToString().Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[columnIndex].Cells[columnIndex].Value = dataGridView2.Rows[columnIndex].Cells[0].Value;
}
}
Now we can see that each iteration is moving down the rows, but across the cells at the same rate, meaning that only the cells in a diagonal formation will even be compared and have a value.
The next issue is that because you are only creating a row when the comparison returns true, this means that the rows in dGvTopics might be less than you are expecting, which means less than the value of i (or columnIndex) which will raise an IndexOutOfRangeException the next successful iteration after any comparison that fails.
You can avoid this problem by iterating over the rows and columns separately and adding one row in dGvTopics for every row in dataGridView2.
We can also make the code clearer by saving a reference to the currentSentence rather than referencing the sentence through the array indexers.
private void btnClassify_Click(object sender, EventArgs e)
{
// remove any existing rows, we will reprocess all records.
this.dGvTopics.Rows.Clear();
// Iterate over the rows in the list of sentences.
for (int rowIndex = 0; rowIndex < dataGridView2.Rows.Count; rowIndex ++)
{
// Create one topic row for every sentence
// row index will always be valid now.
this.dGvTopics.Rows.Add();
// save the sentence value to simplify the comparison code.
string currentSentence = dataGridView2.Rows[rowIndex].Cells[0].Value.ToString();
// iterate over the columns in the topics grid
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex ++)
{
if (currentSentence.Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
this.dGvTopics.Rows[rowIndex].Cells[columnIndex].Value = currentSentence;
}
}
}
}
It's not easy to comprehend why you want to do this or how this information will be used. In general for manipulating values in cells we generally recommend that databinding techniques are used instead, that way you do not access rows and cells anymore or but the underlying objects that they represent.
demonstrating this is outside of the scope of this question, but it's an avenue worth researching when you have time.
In solutions like this where there are two grids that represent the same logical component, (in this case each row in each grid represents the same sentence value) the underlying dataobject might be a single list, where one property on the object is the sentence and each topic column is a property on the same object.
Importantly, using databinding means that the next process that needs to use the information that you have displayed or edited in the grids can do so without access to or knowledge about the grids at all... Something to think about ;)
Update
This code may result in many empty cells in the topics grid. We could instead only add rows as they are needed, but to do this will require a lot more effort.
NOTE: Grids render all the cells for each row, In the last couple of rows, there may still be empty cells if at least one of the cells for that row has a value.
private void btnClassify_Click(object sender, EventArgs e)
{
// remove any existing rows, we will reprocess all records.
this.dGvTopics.Rows.Clear();
// Iterate over the rows in the list of sentences.
for (int rowIndex = 0; rowIndex < dataGridView2.Rows.Count; rowIndex ++)
{
// save the sentence value to simplify the comparison code.
string currentSentence = dataGridView2.Rows[rowIndex].Cells[0].Value.ToString();
// iterate over the columns in the topics grid
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex ++)
{
if (currentSentence.Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
// first we need to know what row index to add this value into
// that involves another iteration, we could store last index in another structure to make this quicker, but here we will do it from first principals.
bool inserted = false;
for(int lookupRow = 0; lookupRow < this.dGvTopics.Rows.Count; lookupRow ++)
{
// find the first row with a null cell;
if(this.dGvTopics.Rows[columnIndex].Value == null)
{
this.dGvTopics.Rows[lookupRow].Cells[columnIndex].Value = currentSentence;
inserted = true;
break;
}
}
if(!inserted)
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[this.dGvTopics.Rows.Count-1].Cells[columnIndex].Value = currentSentence;
}
}
}
}
}
Many thanks to Mr Chris Schaller,
According to his description, the final code changed as follows after compiling:
private void btnClassify_Click(object sender, EventArgs e)
{
// remove any existing rows, we will reprocess all records.
this.dGvTopics.Rows.Clear();
// Iterate over the rows in the list of sentences.
for (int rowIndex = 0; rowIndex < dataGridView2.Rows.Count; rowIndex++)
{
// save the sentence value to simplify the comparison code.
string currentSentence = dataGridView2.Rows[rowIndex].Cells[0].Value.ToString();
// iterate over the columns in the topics grid
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex++)
{
if (currentSentence.Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
// first we need to know what row index to add this value into
// that involves another iteration, we could store last index in another structure to make this quicker, but here we will do it from first principals.
bool inserted = false;
for (int lookupRow = 0; lookupRow < this.dGvTopics.Rows.Count; lookupRow++)
{
// find the first row with a null cell;
if (this.dGvTopics.Rows[lookupRow].Cells[columnIndex].Value == null)
{
this.dGvTopics.Rows[lookupRow].Cells[columnIndex].Value = currentSentence;
inserted = true;
break;
}
}
if (!inserted)
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[this.dGvTopics.Rows.Count - 1].Cells[columnIndex].Value = currentSentence;
}
}
}
}
}
I am using the following to save the current row of a datagridview which is the right number index but its not shifting the row after the processing is done.
saveRow = dgStock.CurrentCell.RowIndex;
BindGrid();
if (saveRow != 0 && saveRow < dgStock.Rows.Count)
{
dgStock.Rows[saveRow].Selected = true;
}
Was wondering if anybody has had any experience of this just wondering if it should be CurrentCell as per the way I am saving the index in the first place as it is setitng it to the first row here some reason after the re binding of the grid.
I'm trying to get the row count of rows which don't have any value (any of columns)
Sample image of the Excel file I'm using:
Highlighted rows have some values in some columns rest of rows are blank I need to count those rows.
I already used this method
int blankRows = 0;
double notEmpty = 1;
while (notEmpty > 0)
{
string aCellAddress = "A" + (rowIndex++).ToString();
Excel.Range row = excelApp.get_Range(aCellAddress, aCellAddress).EntireRow;
notEmpty = excelApp.WorksheetFunction.CountA(row);
if (notEmpty <= 0)
{
blankRows++;
}
}
but this is very time consuming process when file is large and minimum number of blank rows is there.
One thing that might help would be to find the last column that has data and last row that has data as to limit your search.
This is VBA code snippet, but could be easily transformed to C#:
'iterate through columns to determine which is longest to determine the highest row number.
For i = 1 To 16384 'number of columns in excel
'get the row
rowcount = ws.Cells(Rows.Count, i).End(xlUp).Row
'check to see if it's larger than what it is now, if it is, set the value of lRow.
If rowcount > lrow Then
lrow = rowcount
End If
Next
then use a similar loop to get the last row based on the last row, stepping through each row until the last one to get the last column with data.
You can use those values to limit the range that you're looking through. I'm not sure if it will be any faster, but it might help.
While creating and customizing Excel file using Interop and Office 2013 installed, gives me somehow extremely slow results (more than 5 minutes).
In fact, the same thing works very well on Excel 2010 interop (just 50 seconds, exactly same process). (Code snippet below)
It would be nice to know if there is a faster way to do this. I know there are different libraries to do this but I would like to stick to Interop since everything is already in the same.
I am creating Excel file first then check if there are any empty cells or cells containing a specific string and change color of those cells.
To create Excel, I used Object array and parse it that is really faster. Main thing which is pulling it down is to search and change cell color.
// Check for empty cell and make interior silver color
for (int row = 0; row < rowNo; row++)
{
for (int col = 0; col < columnNo; col++)
{
if (string.IsNullOrEmpty(objData[row, col].ToString()))
{
// Access that cell in Excel now and change interior color
Range cell = (Range)activeSheet.Cells[row + 2, col + 1];
cell.Interior.Color = System.Drawing.Color.Silver;
}
}
}
// Check for cells contains "column header string#"
for (int col = 16; col < columnNo; col++)
{
// Get column header - only once and use it for all rows in the same column
string cellValue = activeSheet.Cells[1, col + 1].Value2.ToString();
for (int row = 0; row < rowNo; row++)
{
string value = objData[row, col].ToString();
if (string.IsNullOrEmpty(value) || value.Contains(cellValue+"#"))
{
Range cell = (Range)activeSheet.Cells[row + 2, col + 1];
cell.Interior.Color = System.Drawing.Color.Silver;
cell.Font.Color = System.Drawing.Color.Red;
}
}
}
I'd look into selecting the whole range and setting conditional formatting so blank cells are the color you want. Sorry don't have any code for you, it's been awhile since I've played in that realm.