I am having an error when trying to delete thousands of rows from an excel file.
I am using EPPlus in C# to do the modifying of the data.
This is the code that I am running to do the deletion:
rowsToDelete.Reverse();
foreach (var row in rowsToDelete)
{
try
{
worksheet.DeleteRow(row);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
'rowsToDelete' is acquired in a previous section of code that just gets the row numbers that we need to delete based on the data that is in the worksheet.
'rowsToDelete' is a List that is defined as such:
List<int> rowsToDelete = new List<int>();
The error that I am getting is:
Source array was not long enough. Check the source index, length, and the array's lower bounds. (Parameter 'sourceArray')
The number of rows in this Excel file is 65152 and in this circumstance I am trying to delete 65147 rows. Normally I wont delete this many rows but it so happens that this time I need to.
I noticed a pattern in what row was throwing an error. Every 1024th row it catches this same exception.
I thought that maybe it is the way I was deleting every row individually so I swapped it to delete in a large group by using:
worksheet.DeleteRow(rowFrom, rows, true);
But the same error would happen.
The Reverse() function above on rowsToDelete just makes sure we are deleting from the bottom up so it doesn't cause any issues where rows shift and you delete the wrong row.
Any help would be much appreciated!
Through the process of getting more information for the questions I found my answer.
I was using:
List<int> rowsToDelete = new List<int>();
to initialize the List however I started playing around with giving it a capacity on initialization like this:
List<int> rowsToDelete = new List<int>(maxRowNumber);
where I set 'maxRowNumber' equal to 1,000,000.
Making this change stopped the error from being thrown all together.
Related
The saga of trying to chop flat files up into useable bits continues!
You may see from my other questions that I am trying to wrangle some flat file data into various bits using C# transformer in SSIS. The current challenge is trying to turn a selection of rows with one column into one row with many columns.
A friend has very kindly tipped me off to use List and then to somehow loop through that in the PostExecute().
The main problem is that I do not know how to loop through and create a row to add to the Output Buffer programatically - there might be a variable number of fields listed in the flat file, there is no consistency. For now, I have allowed for 100 outputs and called these pos1, pos2, etc.
What I would really like to do is count everything in my list, and loop through that many times, incrementing the numbers accordingly - i.e. fieldlist[0] goes to OutputBuffer.pos1, fieldlist[1] goes to OutputBuffer.pos2, and if there is nothing after this then nothing is put in pos3 to pos100.
The secondary problem is that I can't even test that my list and writing to an output table is working by specifically using OutputBuffer in PostExecute, never mind working out a loop.
The file has all sorts in it, but the list of fields is handily contained between START-OF-FIELDS and END-OF-FIELDS, so I have used the same logic as before to only process the rows in the middle of those.
bool passedSOF;
bool passedEOF;
List<string> fieldlist = new List<string>();
public override void PostExecute()
{
base.PostExecute();
OutputBuffer.AddRow();
OutputBuffer.field1=fieldlist[0];
OutputBuffer.field2=fieldlist[1];
}
public override void Input_ProcessInputRow(InputBuffer Row)
{
if (Row.RawData.Contains("END-OF-FIELDS"))
{
passedEOF = true;
OutputBuffer.SetEndOfRowset();
}
if (passedSOF && !passedEOF)
{
fieldlist.Add(Row.RawData);
}
if(Row.RawData.Contains("START-OF-FIELDS"))
{
passedSOF = true;
}
}
I have nothing underlined in red, but when I try to run this I get an error message about PostExecute() and "object reference not set to an instance of an object", which I thought meant something contained a null where it shouldn't, but in my test file I have more than two fields between START and END markers.
So first of all, what am I doing wrong in the example above, and secondly, how do I do this in a proper loop? There are only 100 possible outputs right now, but this could increase over time.
"Post execute" It's named that for a reason.
The execution of your data flow has ended and this method is for cleanup or anything that needs to happen after execution - like modification of SSIS variables. The buffers have gone away, there's no way to do interact with the contents of the buffers at this point.
As for the rest of your problem statement... it needs focus
So once again I have misunderstood a basic concept - PostExecute cannot be used to write out in the way I was trying. As people have pointed out, there is no way to do anything with the buffer contents here.
I cannot take credit for this answer, as again someone smarter than me came to the rescue, but I have got permission from them to post the code in case it is useful to anyone. I hope I have explained this OK, as I only just understand it myself and am very much learning as I go along.
First of all, make sure to have the following in your namespace:
using System.Reflection;
using System.Linq;
using System.Collections.Generic;
These are going to be used to get properties for the Output Buffer and to allow me to output the first item in the list to pos_1, the second to pos_2, etc.
As usual I have two boolean variables to determine if I have passed the row which indicates the rows of data I want have started or ended, and I have my List.
bool passedSOF;
bool passedEOF;
List<string> fieldlist = new List<string>();
Here is where it is different - as I have something which indicates I am done processing my rows, which is the row containing END-OF-FIELDS, when I hit that point, I should be writing out my collected List to my output buffer. The aim is to take all of the multiple rows containing field names, and turn that into a single row with multiple columns, with the field names populated across those columns in the row order they appeared.
if (Row.RawData.Contains("END-OF-FIELDS"))
{
passedEOF = true;
//IF WE HAVE GOT TO THIS POINT, WE HAVE ALL THE DATA IN OUR LIST NOW
OutputBuffer.AddRow();
var fields = typeof(OutputBuffer).GetProperties();
//SET UP AND INITIALISE A VARIABLE TO HOLD THE ROW NUMBER COUNT
int rowNumber = 0;
foreach (var fieldName in fieldList)
{
//ADD ONE TO THE CURRENT VALUE OF rowNumber
rowNumber++;
//MATCH THE ROW NUMBER TO THE OUTPUT FIELD NAME
PropertyInfo field = fields.FirstOrDefault(x = > x.Name == string.Format("pos{0}", rowNumber));
if (field != null)
{
field.SetValue(OutputBuffer, fieldName);
}
}
OutputBuffer.SetEndOfRowset();
}
if (passedSOF && !passedEOF)
{
this.fieldList.Add(Row.RawData);
}
if (Row.RawData.Contains("START-OF-FIELDS"))
{
passedSOF = true;
}
So instead of having something like this:
START-OF-FIELDS
FRUIT
DAIRY
STARCHES
END-OF-FIELDS
I have the output:
pos_1 | pos_2 | pos_3
FRUIT | DAIRY | STARCHES
So I can build a position key table to show which field will appear in which order in the current monthly file, and now I am looking forward into getting myself into more trouble splitting the actual data rows out into another table :)
So.... I have created a new project, I have added a database and dataset, created a table, populated it, set it as a data source, and finally have pulled a 'details' and 'datagridview' onto my form.
I have tested these and all displays just fine.
Now the problem. I want to call a cell value and then carry out a task based on the result, but everytime I try, I get an out of range exception.
I've looked around at other topics and tutorials and similar questions and it suggests that I'm trying to call a cell that doesnt exist. This doesnt make sense, as I've tried various low-number index combinations, and my table consists of 4 columns and 16 rows, all of which contain data.
All other indicators point to the program not understanding that the datagridview actually contains data - as if it wants to me to define it before it can call the data - but this makes no sense to me because I've pulled it from a database and I can view the data with no problems when I run the program.
I've tried various different codes to call on a cell, but I get the same exception every time.
int testValue1 = (int)islandMapDataGridView[1, 1].Value;
Console.WriteLine(testValue1);
I should point out that I'm reasonably new to this. What am I doing wrong?
EDIT - Thank you for pointing me to a similar question with a lengthy answer - unfortunately I've looked at many just like this already. They are suggesting that I'm picking a cell that does not exist.
I am aware that these indexes start at 0 and I should compensate for this. Again, my datagridview has 4 columns and 16 rows. I am (for example) attempting to get the data at cell index 1,1 - which is well within the range of the datagridview.
The problem does not seem to be that I am calling a cell that falls outside the range of cells and rows in the datagridview but rather that it seems to think that nothing is there at all. But if I disable the line of code that is supposed to return the value - when I run the program I can see the table just fine and the data is there.
If I have to take the time to define the datagridview just so I can call a value from a cell, then what was the point in setting up databases, datasets, datasources and tables in the first place?
2nd EDIT - What do I have to do to get this looked at again?????
Somebody has stated this is an exact duplicate of another question, and linked me to it. It is NOT an exact duplicate. I am NOT trying to reference a cell that falls outside of the datagridview.
The problem seems to be that the datagridview is not properly reporting back the data contained within in. Presumably I have missed some sort of initialisation, or there is something wrong with the binding.... I dont know... If I knew, then it wouldnt be a problem. The linked topic/question does nothing to explain what I am missing.
Please can I get some help with this?
Please?
3rd Edit - I'd like to add that since my opening question, I've been looking around and trying to find a solution - I stumbled across a tutorial carrying out a similar exericse, and they included the following line of code:
islandMapDataGridView.SelectionMode = DataGridViewSelectionMode.CellSelect;
I hoped this might resolve my problem, but sadly it did not.
I am guessing that something fundamental is missing that allows me to report values from the datagrid, or it is somehow not properly loaded or configured.
I would appreciate any help - even if just to point me a tutorial or similar which expains how to set this sort of thing up.
Do I need to set up a class in order to fetch a value from a cell? Do I need to set up a database context? Please, what am I missing here?
4th Edit! -
Ok so here is the code that was automatically generated by the program when i dragged in the 'datagridview' and 'details' from the table in datasources.
private void islandMapBindingNavigatorSaveItem_Click(object sender, EventArgs e)
{
this.Validate();
this.islandMapBindingSource.EndEdit();
this.tableAdapterManager.UpdateAll(this.starterIslandDataSet);
}
private void Form4_Load(object sender, EventArgs e)
{
// TODO: This line of code loads data into the 'starterIslandDataSet.IslandMap' table. You can move, or remove it, as needed.
this.islandMapTableAdapter.Fill(this.starterIslandDataSet.IslandMap);
}
There doesnt appear to be anything specifically here for the datagridview - but again, when i run the program, it displays it just fine, including all the information that i entered into the table.
Here is the code I am trying to use to pull info from the database -
private void MapTilesTest()
{
islandMapDataGridView.SelectionMode = DataGridViewSelectionMode.CellSelect;
foreach (Control control in tableLayoutPanel2.Controls)
{
PictureBox maptile = control as PictureBox;
if (maptile != null)
{
int testValue1 = (int)islandMapDataGridView[1, 1].Value;
Console.WriteLine(testValue1);
}
}
I am using console.writeline just to confirm that i can indeed pull a value from the table.
I have looked at tutorials for advice, and i see lots where they are manually coding the datagridview and manually inputting the data into the datagridview in the code - i assume this is one way of doing it - but i cant find a good example where someone is pulling information from a database in the same way.
After reading the comments, it may be helpful to clarify some major points in what you are trying to accomplish. First is the reality that when you try to access a cell’s value in a grid, it is imperative that you check several possibilities.
First, does the cell actually exist? In your posted example, you are trying to access the value from the cell at row 1 column 1. This can be accomplished by simply checking to make sure the grid has at least two (2) rows and at least two (2) columns. If you do not check this, and the grid has less than two (2) rows or less than two (2) columns… your code will crash and burn. This obviously is a bad thing and “you” as a programmer should ALWAYS assume the possibility that the cell you are looking for does not exist, is null or the cells value is not a number.
If you have not done so yet, I recommend you take a look at the Try/Catch construct. This useful construct will allow you to TRY some part of your code and CATCH any errors… like an index out of range error. The try catch construct is useful to catch errors and respond to them. In your posted code there would be three possible errors in the line int testValue1 = (int)islandMapDataGridView[1, 1].Value; And you would want to catch all these possible errors in the following order.
1) The cell at row 1 column 1 does not exist.
2) The cells “Value” at column 1 row 1 is null.
3) The cells “Value” at column 1 row 1 is not a number.
The line int testValue1 = (int)islandMapDataGridView[1, 1].Value; could produce one of the three errors above. If the cell does not exist, is null or is not a number, then there would be no value to output.
The convenient aspect of the Try/Catch is that you can “catch” all errors or specific errors. Example, the code below will catch all three possible errors.
private void MapTilesTest() {
try {
int testValue1 = (int)islandMapDataGridView[1, 1].Value;
MessageBox.Show("Value is: " + testValue1.ToString());
}
catch (ArgumentOutOfRangeException ex) {
MessageBox.Show("Cell 1,1 does not exist: total rows = " + islandMapDataGridView.Rows.Count + " total colums = " + islandMapDataGridView.Columns.Count);
}
catch (NullReferenceException ex) {
MessageBox.Show("The 'Value' at Cell 1,1 is null (most likely the new row): total rows = " + islandMapDataGridView.Rows.Count + " total colums = " + islandMapDataGridView.Columns.Count);
}
catch (InvalidCastException ex) {
MessageBox.Show("The 'Value' at Cell 1,1 is NOT a number or is an empty cell: actual cell value is: " + islandMapDataGridView[1, 1].Value.ToString());
}
catch (Exception ex) {
MessageBox.Show("Some other error: " + ex.GetBaseException());
}
}
A second approach is to manually check for these errors. The code below demonstrates this and also uses an int.TryParse construct to validate if the value in the cell is actually a number. Using a TryParse is a better way to convert strings to numbers and validate the string is actually a number. Casting a string to an int as your code does, ALWAYS has the possibility of failure and should ALWAYS be checked. The Try/Parse construct helps in this checking.
The code below still uses a Try/Catch however the difference is that the code below is manually checking for the specific errors: index out of range, a null cell value and valid integer string values. These error checks are made BEFORE they happen and therefore the exceptions will not be thrown. If there are ANY other exceptions thrown… the catch clause will get them.
private void MapTilesTest() {
try {
// make sure there are enough rows
if (islandMapDataGridView.Rows.Count < 2) {
MessageBox.Show("Row Index is out of range: total rows = " + islandMapDataGridView.Rows.Count);
return;
}
// make sure ther are enough columns
if (islandMapDataGridView.Columns.Count < 2) {
MessageBox.Show("Column Index is out of range: total columns = " + islandMapDataGridView.Columns.Count);
return;
}
// make sure the cells value is not null
if (islandMapDataGridView[1, 1].Value == null) {
MessageBox.Show("Value is null (most likely the new row)");
return;
}
// Make sure the cells value is actually an integer
int testValue1 = 0;
if (int.TryParse(islandMapDataGridView[1, 1].Value.ToString(), out testValue1)) {
MessageBox.Show("Valid Value is: " + testValue1.ToString());
}
else {
MessageBox.Show("Value is not a number: " + islandMapDataGridView[1, 1].Value.ToString());
}
}
catch (Exception ex) {
MessageBox.Show("Error: " + ex.Message);
}
}
I hope this makes sense. Lastly, I can assure you that the error you are getting is coming from when the data is being filled into the grid. Obviously, when the first row is filled with data, your code will crash without these checks and this is exactly what is happening. When I asked you to show where the line is being called and you responded with it is being called in the method MapTilesTest, does not help. We need to know WHEN is the MapTilesTest method called in relation to when the grid is filled. I can only assume that MapTilesTest is getting called (before or during) when the grid is filled with data and therefore will cause the error you describe as soon as the first row is added to the grid.
I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutorclass that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.
Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.
To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.
Source: What is IMEX in the OLEDB connection string?
I have an application that reads an entire Excel file into a datatable. I need to retrieve subsets of data from this table into separate datatables. I do this by looping down the cells of the Excel table until I find a blank cell.
The problem is that the last line of the longest column of the Excel table (in other words, the last row of the Excel table) always errors out, with "There is no row at position " followed by whatever the last row of the longest column is.
Here is a scaled-down version of my code that gives me the error:
do {
string MyString = dtExcel.Rows[i][11].ToString();
} while (dt.Rows[i][11].ToString().Length > 0);
Where i is the row counter and [11] is the column I need to save. It works perfectly until the last row of the longest column, and then bombs out.
I've tried checking to see if dtExcel.Rows[i][11] is null, or if the ToString() length is zero, but I can't figure out how to trap this error because the mere act of trying to read it causes the error.
I guess my question is, is there a way of checking to see if this row even exists before I try to check it for null or turn it into a string, or whatever?
Hopefully this is clear. Thanks for any help.
Instead of using a while loop checking if the row is empty, use
for(int i=0;i < dt.Rows.Count ;i++){
//...
}
to loop through the rows.
By doing that, it'll know when to stop in advance, and you will not get an
Index out of Bounds Exception
Try reading your datatable this way:
StringBuilder sb = new StringBuilder();
foreach( var r in dt.AsEnumerable())
{
if(string.IsNullOrEmpty(r[11])) break;
sb.Append(r[11]);
}
return sb.ToString();
I´m trying to set a cell of type "System.Int32" in a DataSet by code, and my try looks like this:
int aCarID = 5; // as an example...
// points out the row that I want to manipulate - I guess this is what doesn´t work???
int insertIndex = myDataSet.tableCars.Rows.Count
myDataSet.tableCars.Rows[insertIndex]["CarID"] = aCarID;
What happens is: I get an exception of "System.IndexOutOfRangeException".
You´re allowed to say that I´m stupid as long as you provide an answer...
UPDATE!
Yes, I´m trying to create a new row, that´s true - that´s why I´m not using "-1".
So what´s the syntax to create a new row?
If I use tableCars.Rows.Add(...) I need to supply a "DataRow Row" to the Add-function, and I don´t have one to provide - yet! (Catch 22)
NEW UPDATE!
Ooops, found it - "NewRow()" :-)
You do realize that indices start with zero in C#? That means if your table has 3 rows, you're trying to access the 4th row because insertIndex = 3.
Try insertIndex - 1.
Edit: Since you're trying to add a new row and already found out how to do so, also don't forget to save those changes to the database (I assume that's what you want to do). The most simple way is to set the UpdateCommand-property of the DataAdapter you used to fill the DataSet (or actually the DataTable in the DataSet).
You can also have the update commands generated, using a subclass of the DbCommandBuilder.
This is a classic off-by-one: valid rows are at indices 0... Rows.Count -1
If you want to make a new row, call tableCars.AddNew() first.
From MSDN:
An IndexOutOfRangeException exception is thrown when an attempt is made to access an element of an array or collection with an index that is outside the bounds of the array or less than zero.
so the problem is when you use a wrong index as Christian said.
try to create new row first, because you want to access row which doesn't exists, or you have to insert your information into row indexed (insertIndex - 1).
Datarow indexes first position is 0, as in arrays.
You're using a strong-typed dataset, but your insert code is actually for a non-strongly typed dataset.
The following code will work for you (and is much easier!)
var insertRow = myDataSet.tableCars.NewtableCarsRow();
insertRow.CarID = aCarID;
myDataSet.AcceptChanges();
That's it!
NOTE: this code works from .NET version 3.5 onwards. For prior versions, replace the var keyword with tableCarsRow (I'm assuming that you didn't customize the default name for the datarow in the DataSet designer).