I am trying to write a method that partitions a DataTable based on a given condition (delegate). My problem is that the condition I'm using always partitions exactly half the DataTable. The condition seems to resolve true for half the DataRows even when NO DataRows should resolve to true.
The method looks like this:
private DataTable PartitionDataTable(DataTable data, Func<DataRow, bool> condition) {
DataTable removedRows = data.Clone();
for(int i = 0; i < data.Rows.Count; i++) {
if(condition(data.Rows[i])){
removedRows.ImportRow(data.Rows[i]);
data.Rows.Remove(data.Rows[i]);
}
}
return removedRows;
}
I call this method using this condition:
DataTable removed = PartitionDataTable(data, (row => DateTimeOffset.Parse(row["timestamp"].ToString()) < baselineTimestamp);
If the highest/max timestamp in the data object (DataTable) is a few minutes earlier than the 'baselineTimestamp', determined using data.Compute("max([timestamp])", String.Empty), then half the records are still partitioned and removed when none of them should be because all of them are < baselineTimestamp.
No idea what's going on. Please help me. The goal is to partition DataRows with timestamps earlier than a given (to the nearest milisecond).
You are removing rows as you iterate over the dataset. So if i=2, then you remove row 2, and row 3 is now row 2. You then increment i, operating on the new row 3 (which was row 4) so you skip the original row 3 altogether.
One trick to resolve this is to iterate backwards since the rows that are shifted are ones that you've already processed:
for(int i = data.Rows.Count-1; i >= 0; i--) {
if(condition(data.Rows[i])){
removedRows.ImportRow(data.Rows[i]);
data.Rows.Remove(data.Rows[i]);
}
}
return removedRows;
}
Every time you remove a row from data it's Rows.Count decreases.
Suppose data contains 10 rows at the start.
You increment i to 5 while you remove 5 rows from data.
On next iteration, i is 6, and data.Rows.Count is 5, the loop terminates.
Since you are removing elements from an array, you have to move backward. If you do it forward, you'll skip half of the elements, and this is why you get half of them back:
DataTable removedRows = data.Clone();
for(int i = data.Rows.Count-1; i >= 0 ; i--) {
if(condition(data.Rows[i])){
removedRows.ImportRow(data.Rows[i]);
data.Rows.Remove(data.Rows[i]);
}
}
return removedRows;
}
Related
I am trying to delete rows from a datatable that have an empty or null cell, at the same time I check if a column has empty cells exceeding a percentage, if it's the case I drop the whole column. I tried proceeding like so:
private DataTable CleanData()
{
var dt = BindData(openFileDialog1.FileName);
for (var j = dt.Columns.Count-1; j >= 0; j--)
{
short count = 0;
for (var i = dt.Rows.Count - 1; i >= 0; i--)
{
if (!string.IsNullOrEmpty(dt.Rows[i][j].ToString())) continue;
count++;
}
var percentage = count * 100.0 / dt.Rows.Count;
if (percentage > 10)
{
dt.Columns.RemoveAt(j);
textFile.Text += " " + j + " ";
}
}
dt.AcceptChanges();
for (var j = dt.Columns.Count - 1; j >= 0; j--)
for (var i = dt.Rows.Count - 1; i >= 0; i--)
{
if (!string.IsNullOrEmpty(dt.Rows[i][j].ToString())) continue;
dt.Rows[i].Delete();
}
dt.AcceptChanges();
return dt;
}
I loop a first time over the datatable cells, then check the percentage of empty cells in a column and if it exceeds 10% I delete that column, then I loop a second time and this time delete each row that has an empty cell, but on the second loop I get an error message (System.Data.DeletedRowInaccessibleException) when it reaches a deleted column index, even though it's supposed to loop on a datatable where those columns aren't there.
Any clue where I messed up ?
Edit: I made the changes proposed but still getting the same error
What I THINK you are running into is an unexpected side-effect of your loop checking % and deleting columns. You are starting with the 0-index column (1st column). Checking and then deleting if empty. Do it in reverse... start with the LAST column and work back to 0 and here is why.
Say you start with a table of 3 columns, so your loop counter is intended to to 0, 1, 2. First cycle through, loop counter 0. You determine data good, no delete. Counter = 1 (2nd column). Determine it needs to be removed due to % empty. Now you delete column[1]. This moves what WAS column[2] and now becomes column[1] and your counter now advances to 2. You never checked what WAS the third column.
If you did in reverse, you start at column[3], check it, find its ok (or not, dont care). Now down 1 to column[2] and determine to remove. So it gets deleted and column[3] is now column[2]. Now you check column[0] and finish no problem.
You are already doing this when checking the ROWS (starting at the end and working back). Same principle applies.
As for your loop on deleting the ROW, I would invert your loops.
Outer loop per ROW (last row first, working back)
{
Inner loop per COLUMN
{
if any single column qualifies to delete the row
{
dt.rows[i].Delete();
break; [break out of the column checking loop]
}
}
[ continue with each ROW]
}
Since your existing outer loop is per column, if you process column 1 and delete row 5, then get to column 2 and try to delete row 5 again, that is your failure.
By checking all columns for a single row FIRST and getting out as soon as one qualifies for deletion, you are done with that row and never need to consider looking at any other columns. Move to the next row for processing.
I'm using visual studio 2017, C#, Windows Forms to create an index for words in a list of sentences.
I have two datagridview:
dataGridView2: This grid has a single column where each row contains a worded sentence.
dGvTopics: This grid has one column for every word that is repeated in the first sentence (first row) in dataGridView2, the column header text is the word.
Goal: I want to click button to categorize, inserting a row in dGvTopics for each row in dataGridView2 (sentences), place a copy of the sentence as the value for that column if the sentence contains the column header text.
My Code is:
private void btnClassify_Click(object sender, EventArgs e)
{
for (int i = 0; i < dGvTopics.Columns.Count; i++)
{
if (dataGridView2.Rows[i].Cells[0].Value.ToString().Contains(dGvTopics.Columns[i].HeaderText))
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[i].Cells[i].Value = dataGridView2.Rows[i].Cells[0].Value;
}
}
}
We can discuss later why you are doing this at all, there are easier ways :)
You need to understand that there are two dimensions to iterate here, the rows in dataGridView2 and the columns in dGvTopics, this means you will need two looping statements, not just one.
Your current code is looping through the Rows in dataGridView2 but only for the number of columns that are in dGvTopics which is a bit confusing.
PRO TIP: Don't use arbitrary single character variable names that have no meaning. Yes i is ubiquitously used to represent index in code you will find around the web, that doesn't mean it is good practice. i should be reserved for lazy programming where there is a single, single dimension array that you are iterating over, in your example there are 4 different levels of arrays that you accessing, the meaning of i is now ambiguous.
Instead of i, use a meaningful variable name like columnIndex or topicIndex. That way when each line is reviewed in isolation, the code is more self documenting. I would even accept t or c in this code, taking the first initial from the conceptual variable meaning will help spot common errors where the wrong indexer is used for the wrong array.
Yes this make the code wordy and long, but we're not constrained by memory space in the same way as our developer ancestors, this doesn't change the size of the final executable, strive to make your code self-documenting.
If you are programming in a code-memory-constrained environment, like for micro-controllers, or tiny chipsets, then still use meaningful short variables, not arbitrarily selected characters.
Applying the above recommendation highlights this first issue:
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex ++)
{
if (dataGridView2.Rows[columnIndex].Cells[0].Value.ToString().Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[columnIndex].Cells[columnIndex].Value = dataGridView2.Rows[columnIndex].Cells[0].Value;
}
}
Now we can see that each iteration is moving down the rows, but across the cells at the same rate, meaning that only the cells in a diagonal formation will even be compared and have a value.
The next issue is that because you are only creating a row when the comparison returns true, this means that the rows in dGvTopics might be less than you are expecting, which means less than the value of i (or columnIndex) which will raise an IndexOutOfRangeException the next successful iteration after any comparison that fails.
You can avoid this problem by iterating over the rows and columns separately and adding one row in dGvTopics for every row in dataGridView2.
We can also make the code clearer by saving a reference to the currentSentence rather than referencing the sentence through the array indexers.
private void btnClassify_Click(object sender, EventArgs e)
{
// remove any existing rows, we will reprocess all records.
this.dGvTopics.Rows.Clear();
// Iterate over the rows in the list of sentences.
for (int rowIndex = 0; rowIndex < dataGridView2.Rows.Count; rowIndex ++)
{
// Create one topic row for every sentence
// row index will always be valid now.
this.dGvTopics.Rows.Add();
// save the sentence value to simplify the comparison code.
string currentSentence = dataGridView2.Rows[rowIndex].Cells[0].Value.ToString();
// iterate over the columns in the topics grid
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex ++)
{
if (currentSentence.Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
this.dGvTopics.Rows[rowIndex].Cells[columnIndex].Value = currentSentence;
}
}
}
}
It's not easy to comprehend why you want to do this or how this information will be used. In general for manipulating values in cells we generally recommend that databinding techniques are used instead, that way you do not access rows and cells anymore or but the underlying objects that they represent.
demonstrating this is outside of the scope of this question, but it's an avenue worth researching when you have time.
In solutions like this where there are two grids that represent the same logical component, (in this case each row in each grid represents the same sentence value) the underlying dataobject might be a single list, where one property on the object is the sentence and each topic column is a property on the same object.
Importantly, using databinding means that the next process that needs to use the information that you have displayed or edited in the grids can do so without access to or knowledge about the grids at all... Something to think about ;)
Update
This code may result in many empty cells in the topics grid. We could instead only add rows as they are needed, but to do this will require a lot more effort.
NOTE: Grids render all the cells for each row, In the last couple of rows, there may still be empty cells if at least one of the cells for that row has a value.
private void btnClassify_Click(object sender, EventArgs e)
{
// remove any existing rows, we will reprocess all records.
this.dGvTopics.Rows.Clear();
// Iterate over the rows in the list of sentences.
for (int rowIndex = 0; rowIndex < dataGridView2.Rows.Count; rowIndex ++)
{
// save the sentence value to simplify the comparison code.
string currentSentence = dataGridView2.Rows[rowIndex].Cells[0].Value.ToString();
// iterate over the columns in the topics grid
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex ++)
{
if (currentSentence.Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
// first we need to know what row index to add this value into
// that involves another iteration, we could store last index in another structure to make this quicker, but here we will do it from first principals.
bool inserted = false;
for(int lookupRow = 0; lookupRow < this.dGvTopics.Rows.Count; lookupRow ++)
{
// find the first row with a null cell;
if(this.dGvTopics.Rows[columnIndex].Value == null)
{
this.dGvTopics.Rows[lookupRow].Cells[columnIndex].Value = currentSentence;
inserted = true;
break;
}
}
if(!inserted)
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[this.dGvTopics.Rows.Count-1].Cells[columnIndex].Value = currentSentence;
}
}
}
}
}
Many thanks to Mr Chris Schaller,
According to his description, the final code changed as follows after compiling:
private void btnClassify_Click(object sender, EventArgs e)
{
// remove any existing rows, we will reprocess all records.
this.dGvTopics.Rows.Clear();
// Iterate over the rows in the list of sentences.
for (int rowIndex = 0; rowIndex < dataGridView2.Rows.Count; rowIndex++)
{
// save the sentence value to simplify the comparison code.
string currentSentence = dataGridView2.Rows[rowIndex].Cells[0].Value.ToString();
// iterate over the columns in the topics grid
for (int columnIndex = 0; columnIndex < dGvTopics.Columns.Count; columnIndex++)
{
if (currentSentence.Contains(dGvTopics.Columns[columnIndex].HeaderText))
{
// first we need to know what row index to add this value into
// that involves another iteration, we could store last index in another structure to make this quicker, but here we will do it from first principals.
bool inserted = false;
for (int lookupRow = 0; lookupRow < this.dGvTopics.Rows.Count; lookupRow++)
{
// find the first row with a null cell;
if (this.dGvTopics.Rows[lookupRow].Cells[columnIndex].Value == null)
{
this.dGvTopics.Rows[lookupRow].Cells[columnIndex].Value = currentSentence;
inserted = true;
break;
}
}
if (!inserted)
{
this.dGvTopics.Rows.Add();
this.dGvTopics.Rows[this.dGvTopics.Rows.Count - 1].Cells[columnIndex].Value = currentSentence;
}
}
}
}
}
I'm trying to get the row count of rows which don't have any value (any of columns)
Sample image of the Excel file I'm using:
Highlighted rows have some values in some columns rest of rows are blank I need to count those rows.
I already used this method
int blankRows = 0;
double notEmpty = 1;
while (notEmpty > 0)
{
string aCellAddress = "A" + (rowIndex++).ToString();
Excel.Range row = excelApp.get_Range(aCellAddress, aCellAddress).EntireRow;
notEmpty = excelApp.WorksheetFunction.CountA(row);
if (notEmpty <= 0)
{
blankRows++;
}
}
but this is very time consuming process when file is large and minimum number of blank rows is there.
One thing that might help would be to find the last column that has data and last row that has data as to limit your search.
This is VBA code snippet, but could be easily transformed to C#:
'iterate through columns to determine which is longest to determine the highest row number.
For i = 1 To 16384 'number of columns in excel
'get the row
rowcount = ws.Cells(Rows.Count, i).End(xlUp).Row
'check to see if it's larger than what it is now, if it is, set the value of lRow.
If rowcount > lrow Then
lrow = rowcount
End If
Next
then use a similar loop to get the last row based on the last row, stepping through each row until the last one to get the last column with data.
You can use those values to limit the range that you're looking through. I'm not sure if it will be any faster, but it might help.
I have a problem where I can either update by one row (and that's it) or by four at once.
The issue is with tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1) on the last two lines, comments are indicating what happens when the statement is placed there.
There will be a limit of 6, but I can't figure out why I can't update one row at a time more than once with the code I've got.
The for loops are only allowing four cells per row and no more than 4 rows (not including the initial row at start).
Can you point me in the right direction please?
C#:
public void addRows_Click1(object sender, EventArgs e)
{
rmvRows.Visible = true;
// rows
for (int rowCount = 0; rowCount < 4; rowCount++ )
{
tr1 = new TableRow();
// cells
for (int cellCount = 0; cellCount < 4; cellCount++)
{
tc1 = new TableCell();
tb1 = new TextBox();
tb1.ID = "tbID" + cellCount.ToString();
tc1.Controls.Add(tb1);
tr1.Cells.Add(tc1);
}
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1);// will add four more rows
}
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1)// adds one one but no more
}
This line:
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1);// will add four more rows
is within a for loop that is running 4 times (from 0 to 3):
for (int rowCount = 0; rowCount < 4; rowCount++ )
{
tr1 = new TableRow();
/* code omitted */
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1);// will add four more rows
}
The line itself only adds one row each time it is called, but it is getting called 4 times. Therefore, the table has 4 additional rows after the for loop's execution.
As for your second line:
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1)// adds one one but no more
This is outside of any loop, so, like any other normal code, it only executes once. Since it only executes once, it only adds one row.
Additionally, since tr1 is not changed after the for loop, the final row added by that last line is going to be a duplicate of whatever the last row created by the loop was.
tbl.Rows.AddAt(tbl.Rows.Count - 1, tr1)
Only adds one row regardless were it is called because tr1 is just one row.
The AddAt will only allow one row to be added at a curtain position.
The AddAt used in the loop should work just fine.
Hope that helps!
My current code:
Remove()
{
for (int i = 0; i < ConGridView.RowCount; i++)
{
if (ConGridView.Rows[i].Cells[0].Value.ToString() == Address)
{
ConGridView.Rows.RemoveAt(i);
break;
}
}
}
So what I am trying to call the remove function every time a client disconnect. the function will remove the connection address from the datagridview. It works well when clients are disconnection one by one. However, if 100 connections gets dropped and it tries to remove 100 connections in less than a second, than it errors out saying "Row Index provided is out of range". How should I check for that ?
So far I've tried:
Try, catch.
if (ConGridView.Rows[i] != null), if (i < ConGridView.RowCount)
None of it seem to work so far. I've also got results using (i < ConGridView.RowCount) where i is 26 while RowCount is 24, but the remove at function still activates..
Any idea on this ?
You can't do this. Your code loops through all the rows in ConGridView, but it deletes them as you do. Therefore, at some point you will try to access an item you have deleted, which will cause the error you described.
Probably the best approach it to iterate through the rows in reverse order. This way, deleting a row at the end won't affect when you access rows at the start.
The problem is you initialise your for loop with the current count of rows and then start removing those same rows from the datagridview. At some point your for loop will try to remove a row at an index that is greater than the number of rows left.
Try this instead:
for (int i = ConGridView.RowCount - 1; i >= 0; i--)
{
if (ConGridView.Rows[i].Cells[0].Value.ToString() == Address)
{
ConGridView.Rows.RemoveAt(i);
break;
}
}
why dont you get the total count to a separate variable and then iterate
Remove()
{
int totalConnections = ConGridView.RowCount;
for (int i = 0; i < totalConnections ; i++)
{
if (ConGridView.Rows[i].Cells[0].Value.ToString() == Address)
{
ConGridView.Rows.RemoveAt(i);
break;
}
}
}
This issue is becuase you are modifying the collection your are iterating over. It will be better if you use a temporary array and two loops to remove your entries.
Remove()
// You can use an array/list or whatever you want below.
Collection<DataGridViewRow> rowsToDelete = new Collection<DataGridViewRow>();
for (int i = 0; i < ConGridView.RowCount; i++)
{
if (ConGridView.Rows[i].Cells[0].Value.ToString() == Address)
{
rowsToDelete.Add(ConGridView.Rows[i]);
break;
}
}
// now remove the marked entries.
foreach(DataGridViewRow deletedRow in rowsToDelete)
{
ConGridView.Rows.Remove(deletedRow);
}
When you remove an item from an array, it is reconstructed; shifting the remaining elements up by one to remove the gap of the index you have removed.
1. guybrush threepwood
2. murray
3. elaine
4. Jimmy Gibbs Jr.
If you remove 2. item in here; it becomes this:
1. guybrush threepwood
2. elaine
3. Jimmy Gibbs Jr.
When you are iterating, imagine:
for (int i = 0; i < myArray.Count; i++)
{
if (i == 2) myArray.RemoveAt(i);
}
While running this, when i = 3, the element at 3 has changed, you expect it to be 'elaine' but it is 'Jimmy Gibbs Jr.'. One way to fix this is decrease i by one if we delete it, making sure that i refers to correct value.
for (int i = 0; i < myArray.Count; i++)
{
if (i == 2)
{
myArray.RemoveAt(i);
i--;
}
}
I would go for LINQ in this case, though, everything is easier with that.
myArray.RemoveAll(x => x == "murray");
I've tried all the suggestions posted by everyone here, however, the error was still there.
I've solved the problem using a different way... I've switched to TreeNodeView since that's what I was going to use ultimately. Now I can remove as many connection as i want with:
For each(TreeNode TN in ConTreeView)
{
ConTreeView.Nodes.Remove(TN);
}