I have an application that reads an entire Excel file into a datatable. I need to retrieve subsets of data from this table into separate datatables. I do this by looping down the cells of the Excel table until I find a blank cell.
The problem is that the last line of the longest column of the Excel table (in other words, the last row of the Excel table) always errors out, with "There is no row at position " followed by whatever the last row of the longest column is.
Here is a scaled-down version of my code that gives me the error:
do {
string MyString = dtExcel.Rows[i][11].ToString();
} while (dt.Rows[i][11].ToString().Length > 0);
Where i is the row counter and [11] is the column I need to save. It works perfectly until the last row of the longest column, and then bombs out.
I've tried checking to see if dtExcel.Rows[i][11] is null, or if the ToString() length is zero, but I can't figure out how to trap this error because the mere act of trying to read it causes the error.
I guess my question is, is there a way of checking to see if this row even exists before I try to check it for null or turn it into a string, or whatever?
Hopefully this is clear. Thanks for any help.
Instead of using a while loop checking if the row is empty, use
for(int i=0;i < dt.Rows.Count ;i++){
//...
}
to loop through the rows.
By doing that, it'll know when to stop in advance, and you will not get an
Index out of Bounds Exception
Try reading your datatable this way:
StringBuilder sb = new StringBuilder();
foreach( var r in dt.AsEnumerable())
{
if(string.IsNullOrEmpty(r[11])) break;
sb.Append(r[11]);
}
return sb.ToString();
Related
I am having an error when trying to delete thousands of rows from an excel file.
I am using EPPlus in C# to do the modifying of the data.
This is the code that I am running to do the deletion:
rowsToDelete.Reverse();
foreach (var row in rowsToDelete)
{
try
{
worksheet.DeleteRow(row);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
'rowsToDelete' is acquired in a previous section of code that just gets the row numbers that we need to delete based on the data that is in the worksheet.
'rowsToDelete' is a List that is defined as such:
List<int> rowsToDelete = new List<int>();
The error that I am getting is:
Source array was not long enough. Check the source index, length, and the array's lower bounds. (Parameter 'sourceArray')
The number of rows in this Excel file is 65152 and in this circumstance I am trying to delete 65147 rows. Normally I wont delete this many rows but it so happens that this time I need to.
I noticed a pattern in what row was throwing an error. Every 1024th row it catches this same exception.
I thought that maybe it is the way I was deleting every row individually so I swapped it to delete in a large group by using:
worksheet.DeleteRow(rowFrom, rows, true);
But the same error would happen.
The Reverse() function above on rowsToDelete just makes sure we are deleting from the bottom up so it doesn't cause any issues where rows shift and you delete the wrong row.
Any help would be much appreciated!
Through the process of getting more information for the questions I found my answer.
I was using:
List<int> rowsToDelete = new List<int>();
to initialize the List however I started playing around with giving it a capacity on initialization like this:
List<int> rowsToDelete = new List<int>(maxRowNumber);
where I set 'maxRowNumber' equal to 1,000,000.
Making this change stopped the error from being thrown all together.
I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutorclass that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.
Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.
To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.
Source: What is IMEX in the OLEDB connection string?
I calculate the amount of rows I want to have in my second column using a for loop based on reading how many records a file has that has been opened. I have researched and tried various solutions but nothing works, yet it seems so simple. Below is my current code where I retrieve the file's length and do a quick sum, entering a for loop where (at the moment) I am only able to populate the first column.
long Count = 1;
FileInfo Fi = new FileInfo(file);
long sum = (Fi.Length / 1024) - Count;
for (int i = 0; i < sum; i++)
{
DataGridView1.Rows.Add(Count++);
}
I'm not sure how to do it but I know the above code adds to the first column by default - I don't know how to modify it. I know by:
DataGridView1.Rows.Add("a","b");
... The 'b' value is displayed in the second column, but I don't want anything for now in the first where 'a' is.
I have looked at insert a row with one column datagridview c# but it is related to merging columns, again, I don't want this.
DataGridView1.Rows.Add("",Count++);
Works to an extent, but is not the right way to do it. I'm going to be adding data to the first column later on.
If you want to omit the value for the first column, just add null or DBNull.Value, e.g.:
DataGridView1.Rows.Add(DBNull.Value, Count++);
This way, the first column will be empty while the second columns contains the value of Count.
When I retrieve data into my DataSet, the year column is the 3rd column in my DataSet. I do not need this column after using it do so some calculations, so I remove it using the Remove method passing the column's name to the Remove method. The problem is that I retrieve the data in a foreach loop, so when I retrieve the data again, the year column is now the last column in the DataSet and when I try to access it, it throws an error saying year column was not found. My way around this was to clone by DataSet into a DataTable and then to import each row from the dataset into my DataTable, but is there a more efficient way or a way to keep the year column in it's original position.
private int GetData()
{
dataSet.GetExportData();
DataTable dt = dataSet.ExportData;
for(int i = 0; i <dataSet.ExportData; i++)
{
//Do Stuff
}
dataSet.Columns.Remove(dataSet.ExportData.YearColumn.ColumnName);
}
In the above code, when I open up a dialog and select data to export, it works the first time, but if I leave the dialog open and click the button to export again, it throws the error. If I close the dialog and reopen, it works fine. As I said before I noticed when it retrieves the data again on the button click with dataSet.GetExportData(), it puts the Year column as the last column instead of the defined position in the strongly-typed DataSet which I assume is the problem, but can't figure out how to fix it besides doing a Clone and Import.
It'd be better to be in comment, but I want to take advantage of code styling, also it's hard to say what exactly you want to achieve without sighting the code,
foreach (DataRow row in myTable.Rows)
{
//do something with year column
myTable.Columns.Remove("year");
}
above code works fine in the first iteration, but will fail afterwards, as myTable.Columns.Remove("year") removes the whole column, from the table not just from current row, and gives you: You'll get Column 'year' does not belong to table Table.exception.
Why not taking myTable.Columns.Remove("year"); out of the foreach instead?
foreach (DataRow row in myTable.Rows)
{
//do something with year column
}
myTable.Columns.Remove("year");
Im looking to get a column and all its row data from a DataTable object then create a new column in another data table and append it with the new column and its rows. The issue I keep encountering is the row data will come out, as will the column names, but all of the rows are appended to the same column, I'm sure im missing something obvious, infact I know I am! Any help is greatly appriciated.
void GetColumns()
{
// add the columns to the new datatable
foreach (int i in mapValues)
{
SplitData.Columns.Add(i.ToString());
}
// map values contains the index numbers of my target columns
foreach (int x in mapValues)
{
foreach (DataRow row in OrigData.Rows)
{
SplitData.Rows.Add(row[mapValues[LocalIndex]]);
}
LocalIndex++;
}
}
The DataRow.Add overload that you are using is the params one, so you are just putting your orig column data in the first column of the new DataTable.
You probably want something like:
DataRow newRow = SplitData.NewRow(); // gets a new blank row with the right schema
newRow[x.ToString()] = row[mapValues[LocalIndex]; // sets the column (that you created before) to the orig data
SplitData.Rows.Add(newRow);
as the core of your second for loop. You might as well do it in one loop too.
Although the accepted answer was totally right and I learned something from it, it turns out the DataTable.ImportRow method is exactly what I want for my needs, so just for future reference for anyone who might stumble upon this.