Skip certain Rows and Columns while Parsing XLS

Skip certain Rows and Columns while Parsing XLS - c#

I'm using the following code to parse the XLS file using ExcelDataReader. I would like to exclude the first three rows, first two columns followed by any columns that are after 9.
//create the reader
var reader = ExcelReaderFactory.CreateReader(stream);
var result = reader.AsDataSet();
//remove the first 3 rows
DataRowCollection dt = result.Tables[0].Rows;
dt.RemoveAt(0);
dt.RemoveAt(1);
dt.RemoveAt(2);
//exclude the column 1 and2 and any columns after 9
for (int columnNumber = 2; columnNumber < 8; columnNumber++)
{
foreach (DataRow dr in dt)
{
Debug.Log(dr[columnNumber].ToString());
msg += dr[columnNumber].ToString();
}
}
Unfortunately, it does not skip the rows and columns as expected. How do I skip specific columns and rows using excelDataReader?

You are doing the following
dt.RemoveAt(0);
dt.RemoveAt(1);
dt.RemoveAt(2);
When the first line executes, the rows are reindexed with the 1 becoming 0, 2 becoming 1 and so on.
When the second line executes you have now removed the line that was position 2 originally. The rows are again reindexed.
When the third line executes you are then again removing an incorrect row.
As a result, when this process completes, it will have removed the lines that were originally positioned at 0, 2, and 4.
Change the code to remove the correct lines, or skip three lines with linq or a for loop.
Sample using for loop (not tested).
//create the reader
var reader = ExcelReaderFactory.CreateReader(stream);
var result = reader.AsDataSet();
DataRowCollection dt = result.Tables[0].Rows;
//ignore the first 3 rows
for(int dataRowCount = 3; dataRowCount < dt.Count; dataRowCount++)
{
//exclude the column 1 and 2 and any columns after 9
for (int columnNumber = 2; columnNumber < 8; columnNumber++)
{
Debug.Log(dr[dataRowCount][columnNumber].ToString());
msg += dr[dataRowCount][columnNumber].ToString();
}
}

Related

How to split DataTable rows using a for loop?

I have a DataTable like this:
And I want to write a for loop that shows debit and credit line on its own separate line like this:
Here is my unfinished code:
DataTable dt = new DataTable();
dt.Columns.Add("DEBIT", typeof(string));
dt.Columns.Add("CREDIT", typeof(string));
dt.Columns.Add("AMOUNT", typeof(double));
dt.Rows.Add("Debit1", "Credit1", 10);
dt.Rows.Add("Debit2", "Credit2", 8);
dt.Rows.Add("Debit3", "Credit3", 12);
for (int i=1; i <= dt.Rows.Count; i++)
{
//The first image (datatable) has three debit and credit lines are showing on the same line. Normally the debit line and credit line are showing on its own separate lines.
//With above given datatable I want to construct for loop that shows three debit lines and three credit lines as demonstrated in the second image. In this case it shows 6 lines
}
I would much appreciate it if you could help me with this.

Steps:
Start the loop in reverse (so you can easily insert rows).
Create a new row for the credit and fill it with the relevant data.
Remove the credit data from the original row.
Insert the new column in the position following the original row.
Something like this should do the trick:
for (int i = dt.Rows.Count - 1; i >= 0; i--)
{
var row = dt.Rows[i];
if (!string.IsNullOrEmpty(row["CREDIT"].ToString()))
{
var creditRow = dt.NewRow();
creditRow["CREDIT"] = row["CREDIT"];
creditRow["AMOUNT"] = row["AMOUNT"];
row["CREDIT"] = string.Empty;
dt.Rows.InsertAt(creditRow, i + 1);
}
}
Try it online.

The source contains no DataRows. error when one iteration in for loop

I am making a program in Visual Studio where you can read in an excel file in a specific format and where my program converts the data from the excel file in a different format and stores it in a database table.
Below you can find a part of my code where something strange happens
//copy schema into new datatable
DataTable _longDataTable = _library.Clone();
foreach (DataRow drlibrary in _library.Rows)
{
//count number of variables in a row
string check = drlibrary["Check"].ToString();
int varCount = check.Length - check.Replace("{", "").Length;
int count_and = 0;
if (check.Contains("and") || check.Contains("or"))
{
count_and = Regex.Matches(check, "and").Count;
varCount = varCount - count_and;
}
//loop through number of counted variables in order to add rows to long datatable (one row per variable)
for (int i = 1; i <= varCount; i++)
{
var newRow = _longDataTable.NewRow();
newRow.ItemArray = drlibrary.ItemArray;
string j = i.ToString();
//fill variablename with variable number
if (i < 10)
{
newRow["VariableName"] = "Variable0" + j;
}
else
{
newRow["VariableName"] = "Variable" + j;
}
}
}
When varCount equals 1, I get the following error message when running the program after inserting an excel file
The source contains no DataRows.
I don't know why I can't run the for loop with just one iteration. Anyone who can help me?

Fastest Way to Loop through SQL Database column against Excel Column - C#

I have a sql table with two columns: OldValue and NewValue. I have the same two columns in an excel spreadsheet. I want to find the quickest way to iterate through both the database and excel spreadsheet checking if the OldValue column in the database is the same as the OldValue column in the spreadsheet.
My logic works such that I iterate the entire sql column (333228 records) looking for a match against the excel column which has 153 000 rows. This iteration is performance heavy and takes hours without even finishing - ends up hanging. How can I quickly do this? 153 000 x 333228 = 24 billion iterations which is computationally intensive.
I read here https://codereview.stackexchange.com/questions/47368/looping-through-an-excel-document-in-c but couldn't get what I was looking for. The code works and has already found 500 matches but its slow considering I need to get through 333228 records in the database.
List<sim_info> exel_sims = new List<sim_info>();
Microsoft.Office.Interop.Excel.Application Excel_app = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbooks work_books = Excel_app.Workbooks;
string excel_file_path = Application.StartupPath + "\\TestSample";
Microsoft.Office.Interop.Excel.Workbook work_book = work_books.Open(excel_file_path);
work_book.SaveAs(excel_file_path + ".csv", Microsoft.Office.Interop.Excel.XlFileFormat.xlCSVWindows);
Microsoft.Office.Interop.Excel.Sheets work_sheets = work_book.Worksheets;
Microsoft.Office.Interop.Excel.Worksheet work_sheet = (Microsoft.Office.Interop.Excel.Worksheet)work_sheets.get_Item(1);
for (int j = 2; j < work_sheet.Rows.Count; j++)
{
try
{
temp_sim_info.msisdn = cell_to_str(work_sheet.Cells[j, 1]).Trim();
temp_sim_info.mtn_new_number = cell_to_str(work_sheet.Cells[j, 8]).Trim();
temp_sim_info.status = cell_to_str(work_sheet.Cells[j, 9]).Trim();
if (temp_sim_info.msisdn.Length < 5 || temp_sim_info.mtn_new_number.Length > 15) //Valid cellphone number length contains 11 digits +27XXXXXXXXX / 14 digits for the new msisdn. This condition checks for invalid cellphone numbers
{
if (zero_count++ > 10)
break;
}
else
{
zero_count = 0;
exel_sims.Add(temp_sim_info);
if (exel_sims.Count % 10 == 0)
{
txtExcelLoading.Text = exel_sims.Count.ToString();
}
}
}
catch
{
if (zero_count++ > 10)
break;
}
// }
txtExcelLoading.Text = exel_sims.Count.ToString();
work_sheet.Columns.AutoFit();
for (int i = 0; i < TestTableInstance.Rows.Count; i++)
{
string db_oldNumbers = "";
string db_CellNumber = "";
if (!TestTableInstance.Rows[i].IsNull("OldNumber"))
db_oldNumbers = TestTableInstance[i].OldNumber;
else
db_oldNumbers = TestTableInstance[i].CellNumber;
if (!TestTableInstance.Rows[i].IsNull("CellNumber"))
db_CellNumber = temp_sim_info.mtn_new_number;
for (int k = 0; k < exel_sims.Count; k++)
{
sim_info sim_Result = exel_sims.Find(x => TestTableInstance[i].CellNumber == x.msisdn);
if (TestTableInstance[i].CellNumber == exel_sims[k].msisdn && sim_Result != null)
{
//If match found then do logic here
}
}
}
}
MessageBox.show("DONE");
TableInstance is a DataSet of the database loaded in memory. The second inner loop iterates the entire DB column for each record until it finds a match in the first row of the OldValue column in the spreadsheet.
My code works. Its tried and tested when I have an excel sheet of 800 rows and a DB table consisting of 1000 records. It completes under 5 minutes. But for hundred thousand records it hangs for hours.

Exactly! Why the heck are you use C# for this? Load the Excel file into a temp table in your DB and do a comparison between your actual SQL table (which allegedly has all the data you have in the Excel file) and the temp table (or View). This kind of comparison should complete in a couple seconds.
select *
from dbtest02.dbo.article d2
left join dbtest01.dbo.article d1 on d2.id=d1.id
The left join shows all rows from the left table "dbtest02.dbo.article", even if there are no matches in the "dbtest01.dbo.article":
OR
select * from dbtest02.dbo.article
except
select * from dbtest01.dbo.article
See the link below for some other ideas of how to do this.
https://www.mssqltips.com/sqlservertip/2779/ways-to-compare-and-find-differences-for-sql-server-tables-and-data/

Replicate record in a DataTable based on value of a column using C#

I have a record in a dataTable as shown below.
1 Test 7Dec2014 15:40 one,two,three
Since the last column has 3 comma separated values, the resultant DataTable should like below with replicated records.
1 Test 7Dec2014 15:40 one
2 Test 7Dec2014 15:40 two
3 Test 7Dec2014 15:40 three
Please help me with an optimized way to achieve the above result.

The optimized way I found for the above problem is as below. If anybody has a better solution please let me know.
string[] strValues;
for (int i = 0; i < dtTable.Rows.Count; i++)
{
strValues= dtTable.Rows[i]["Column_Name"].ToString().Split(',');
if (strValues.Length > 1)
{
dtTable.Rows[i]["Column_Name"] = strValues[0];
for (int j = 1; j < strValues.Length; j++)
{
var TargetRow = dtTable.NewRow();
var OriginalRow = dtTable.Rows[i];
TargetRow.ItemArray = OriginalRow.ItemArray.Clone() as object[];
TargetRow["Column_Name"] = strValues[j];
dtTable.Rows.Add(TargetRow);
}
}
}

get excel range with exact row, column positions

I've populated an excel file starting with column 14. The important rows are from 2 to 30 let's say.
valueArray = (object[,])excelRange.get_Value(Excel.XlRangeValueDataType.xlRangeValueDefault);
I am using get_Value to get all the values on rows and columns. The problem appears when it changes my columns from 14th to 1, 15th to 2 and so on. So if i am going to iterate with
valueArray[row, column]
searching for the row 2 and column 14, row 3 and column 14 and so on, i will get an error because the valueArray has a different interpretation of columns.
Is there any way to get fixed positions from the excel sheet?

The simlest way is to apply an offset when iterating with valueArray:
for (int row = 2; row <= 30; row++)
{
for (int column = 14; column <= 100500; column++)
{
object value = valueArray[row - 2, column - 14];
}
}
otherwise you could use an another data structure:
Dictionary<Point, Object> valueArray = new Dictionary<Point, Object>();
valueArray[new Point(2, 14)] = (worksheet.Cells[2, 14] as Excel.Range).Value; // Add values in this way
Object value = valueArray[new Point(2, 14)]; // Read values

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Skip certain Rows and Columns while Parsing XLS - c#

Related

How to split DataTable rows using a for loop?

The source contains no DataRows. error when one iteration in for loop

Fastest Way to Loop through SQL Database column against Excel Column - C#

Replicate record in a DataTable based on value of a column using C#

get excel range with exact row, column positions

Categories

Resources