Using assertions to compare two excel files - c#

Im using Visual Studio to create an automated test that creates two excel sheets. As a final check, I need to compare the content of these two excel sheets and ensure that they are equal. Is there any way to do this with assertions?
Something like Assert.AreEqual(file1, file2);?
Any help or guidance would be appreciated!

Thanks to Mangist for guidance on this. Ive written the following to compare two excel files:
public bool compareFiles(string filePath1, string filePath2)
{
bool result = false;
Excel.Application excel = new Excel.Application();
//Open files to compare
Excel.Workbook workbook1 = excel.Workbooks.Open(filePath1);
Excel.Workbook workbook2 = excel.Workbooks.Open(filePath2);
//Open sheets to grab values from
Excel.Worksheet worksheet1 = (Excel.Worksheet)workbook1.Sheets[1];
Excel.Worksheet worksheet2 = (Excel.Worksheet)workbook2.Sheets[1];
//Get the used range of cells
Excel.Range range = worksheet2.UsedRange;
int maxColumns = range.Columns.Count;
int maxRows = range.Rows.Count;
//Check that each cell matches
for (int i = 1; i <= maxColumns; i++)
{
for (int j = 1; j <= maxRows; j++)
{
if (worksheet1.Cells[j, i].Value == worksheet2.Cells[j, i].Value)
{
result = true;
}
else
result = false;
}
}
//Close the workbooks
GC.Collect();
GC.WaitForPendingFinalizers();
Marshal.ReleaseComObject(range);
Marshal.ReleaseComObject(worksheet1);
Marshal.ReleaseComObject(worksheet2);
workbook1.Close();
workbook2.Close();
excel.Quit();
Marshal.ReleaseComObject(excel);
//Tell us if it is true or false
return result;
}
And using an assertion to check result:
Assert.IsTrue(compareFiles(testFile, compareFile), "Output files do not match.");

Can you convert the expected/actual Excel sheets to a text format, such as CSV?
If so, you could use Approval Tests instead. This allows you to have a text file as your "expected" test result. When tests fail it can show you the actual result of the test, diff'd against the expected result.
Screenshot taken from this review of Approval Tests.

One Option will be using Open source library called as EPPlus. You can download and refer it in you automated test application. EPPlus gives you multiple ways to read an excel file and compare. One such option is C# Datatable. Here is a sample code..
public static DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(path))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
DataTable tbl = new DataTable();
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
DataRow row = tbl.Rows.Add();
foreach (var cell in wsRow)
{
row[cell.Start.Column - 1] = cell.Text;
}
}
return tbl;
}
}
For both the Excel files , same process can be adopted, and it will give you the desired results.

Related

c# working on excel files with large data

I'm copying data from first sheet of different excel files to a single workbook. I already have tried it with different alternatives like npoi, spire.xls and Interop which works good, but it kills too much of my time. It would really be thankful if anyone can suggest me with a better one. Been through many forms on the web, but couldn't find.
FYI: Each of My files are more than 50 MB in size. A few being 10 MB or less.
This is one of which I have tried (Uses Spire.xls):
workbook = new Workbook();
//laod first file
workbook.LoadFromFile(names[0]);
//load the remaining files starting with second file
for (int i = 1; i < cnt; i++)
{
LoadFIle(names[i]);
//merge the loaded file immediately and than load next file
MergeData();
}
private void LoadFIle(string filePath)
{
//load other workbooks starting with 2nd workbbook
tempbook = new Workbook();
tempbook.LoadFromFile(filePath);
}
private void MergeData()
{
try
{
int c1 = workbook.ActiveSheet.LastRow, c2 = tempbook.Worksheets[0].LastRow;
//check if you have exceeded 1st sheet limit
if ((c1 + c2) <= 1048575)
{
//import the second workbook's worksheet into the first workbook using a datatable
//load 1st sheet of tempbook into sheet
Worksheet sheet = tempbook.Worksheets[0];
//copy data from sheet into a datatable
DataTable dataTable = sheet.ExportDataTable();
//load sheet1
Worksheet sheet1 = workbook.Worksheets[workbook.ActiveSheetIndex];
sheet1.InsertDataTable(dataTable, false, sheet1.LastRow + 1, 1);
}
else if ((c1 >= 1048575 && c2 >= 1048575) || c1 >= 1048575 || c2 >= 1048575 || (c1 + c2) >= 1048575)
{
workbook.Worksheets.AddCopy(tempbook.Worksheets[0]);
indx = workbook.ActiveSheet.Index;
workbook.ActiveSheetIndex = ++indx;
}
else
{
//import the second workbook's worksheet into the first workbook using a datatable
//load 1st sheet of tempbook into sheet
Worksheet sheet = tempbook.Worksheets[0];
//copy data from sheet into a datatable
DataTable dataTable = sheet.ExportDataTable();
//load sheet1
Worksheet sheet1 = workbook.Worksheets[workbook.ActiveSheetIndex];
sheet1.InsertDataTable(dataTable, false, sheet1.LastRow + 1, 1);
}
}
catch (IndexOutOfRangeException)
{
}
}
}
Well, this works good but as said takes a long time. Any suggestions are welcome. Thanks in advance.
Here is my (fastest I know of) implementation using Excel interop. Although I looked carefully to release all (must have missed one), 2 Excel instances remain in the processes list, they are closed after the program ends.
The key is to only have 2 Open Excel instances and to copy the data as a Block using Range.Value2.
//Helper function to cleanup
public void ReleaseObject(object obj)
{
if (obj != null && Marshal.IsComObject(obj))
{
Marshal.ReleaseComObject(obj);
}
}
public void CopyIntoOne(List<string> pSourceFiles, string pDestinationFile)
{
var sourceExcelApp = new Microsoft.Office.Interop.Excel.Application();
var destinationExcelApp = new Microsoft.Office.Interop.Excel.Application();
// TODO: Check if it exists
destinationExcelApp.Workbooks.Open(pDestinationFile);
// for debug
//destinationExcelApp.Visible = true;
//sourceExcelApp.Visible = true;
int i = 0;
var sheets = destinationExcelApp.ActiveWorkbook.Sheets;
var lastsheet = destinationExcelApp.ActiveWorkbook.Sheets[sheets.Count];
ReleaseObject(sheets);
foreach (var srcFile in pSourceFiles)
{
sourceExcelApp.Workbooks.Open(srcFile);
// get extends
var lastRow = sourceExcelApp.ActiveSheet.Cells.Find("*", System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value, XlSearchOrder.xlByRows,
XlSearchDirection.xlPrevious, false, System.Reflection.Missing.Value, System.Reflection.Missing.Value);
var lastCol = sourceExcelApp.ActiveSheet.Cells.Find("*", System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, XlSearchOrder.xlByColumns, XlSearchDirection.xlPrevious, false,
System.Reflection.Missing.Value, System.Reflection.Missing.Value);
var startCell = (Range) sourceExcelApp.ActiveWorkbook.ActiveSheet.Cells[1, 1];
var endCell = (Range) sourceExcelApp.ActiveWorkbook.ActiveSheet.Cells[lastRow.Row, lastCol.Column];
var myRange = sourceExcelApp.ActiveWorkbook.ActiveSheet.Range[startCell, endCell];
// copy the values
var value = myRange.Value2;
// create sheet in new Workbook at the end
Worksheet newSheet = destinationExcelApp.ActiveWorkbook.Sheets.Add(After: lastsheet);
ReleaseObject(lastsheet);
lastsheet = newSheet;
//its even faster when adding it at the front
//Worksheet newSheet = destinationExcelApp.ActiveWorkbook.Sheets.Add();
// change that to a good name
newSheet.Name = ++i + "";
var dstStartCell = (Range) destinationExcelApp.ActiveWorkbook.ActiveSheet.Cells[1, 1];
var dstEndCell = (Range) destinationExcelApp.ActiveWorkbook.ActiveSheet.Cells[lastRow.Row, lastCol.Column];
var dstRange = destinationExcelApp.ActiveWorkbook.ActiveSheet.Range[dstStartCell, dstEndCell];
// this is the actual paste
dstRange.Value2 = value;
//cleanup
ReleaseObject(startCell);
ReleaseObject(endCell);
ReleaseObject(myRange);
ReleaseObject(value);// cannot hurt, but not necessary since its a simple array
ReleaseObject(dstStartCell);
ReleaseObject(dstEndCell);
ReleaseObject(dstRange);
ReleaseObject(newSheet);
ReleaseObject(lastRow);
ReleaseObject(lastCol);
sourceExcelApp.ActiveWorkbook.Close(false);
}
ReleaseObject(lastsheet);
sourceExcelApp.Quit();
ReleaseObject(sourceExcelApp);
destinationExcelApp.ActiveWorkbook.Save();
destinationExcelApp.Quit();
ReleaseObject(destinationExcelApp);
destinationExcelApp = null;
sourceExcelApp = null;
}
I have tested it on small excel files and are curious how it behaves with larger files.

C# How to write each table in a Word file to its own Excel file

I'm trying to write code in C# WinForms that allows a user to select a directory tree, and extract all of the table data from a word document into an excel file. Presently, the code compiles and you can select your directories, etc, but once it begins to iterate through the loop for each table it crashes.
The program successfully opens the first word file and writes the first excel file (table_1_whatever.xlsx) and saves it in the destination folder. However, on the second table in the same file I get this error on this line of code:
worksheet.Cells[row, col] = objExcelApp.WorksheetFunction.Clean(table.Cell(row, col).Range.Text);
System.Runtime.InteropServices.COMException: 'The requested member of the collection does not exist.'
I can't seem to figure out why it doesn't exist. Each time it goes through the foreach loop it should be creating a new worksheet, but it doesn't appear to be working. Any insight, examples, or suggestions are welcome!
Code:
private void WordRunButton_Click(object sender, EventArgs e)
{
var excelApp = new excel.Application();
excel.Workbooks workbooks = excelApp.Workbooks;
var wordApp = new word.Application();
word.Documents documents = wordApp.Documents;
wordApp.Visible = false;
excelApp.Visible = false;
string[] fileDirectories = Directory.GetFiles(WordSourceBox.Text, "*.doc*",
SearchOption.AllDirectories);
foreach (var item in fileDirectories)
{
word._Document document = documents.Open(item);
int tableCount = 1;
foreach (word.Table table in document.Tables)
{
if (table.Cell(1, 1).ToString() != "Doc Level")
{
string wordFile = item;
appendName = Path.GetFileNameWithoutExtension(wordFile) + "_Table_" + tableCount + ".xlsx";
var workbook = excelApp.Workbooks.Add(1);
excel._Worksheet worksheet = (excel.Worksheet)workbook.Sheets[1];
for (int row = 1; row <= table.Rows.Count; row++)
{
for (int col = 1; col <= table.Columns.Count; col++)
{
var cell = table.Cell(row, col);
var range = cell.Range;
var text = range.Text;
var cleaned = excelApp.WorksheetFunction.Clean(text);
worksheet.Cells[row, col] = cleaned;
}
}
workbook.SaveAs(Path.Combine(WordOutputBox.Text, Path.GetFileName(appendName)), excel.XlFileFormat.xlWorkbookDefault);
workbook.Close();
Marshal.ReleaseComObject(workbook);
}
else
{
WordOutputStreamBox.AppendText(String.Format("Table {0} ignored\n", tableCount));
}
WordOutputStreamBox.AppendText(appendName + "\n");
tableCount++;
}
document.Close();
Marshal.ReleaseComObject(document);
WordOutputStreamBox.AppendText(item + "\n");
}
WordOutputStreamBox.AppendText("\nAll files parsed");
excelApp.Application.Quit();
workbooks.Close();
excelApp.Quit();
WordOutputStreamBox.AppendText("\nExcel files closed");
Marshal.ReleaseComObject(workbooks);
Marshal.ReleaseComObject(excelApp);
WordOutputStreamBox.AppendText("\nExcel files released");
wordApp.Application.Quit();
wordApp.Quit();
WordOutputStreamBox.AppendText("\nWord files have been quit");
Marshal.ReleaseComObject(documents);
Marshal.ReleaseComObject(wordApp);
WordOutputStreamBox.AppendText("\nWord files have been released\n");
}
Edit 1:(Sorry for posting in the wrong place the first time!)
Ok, so the problem has been isolated...
The code logic of the code was fine, and the table was in fact there. The issue is that the second table of these files has a set of split cells in it, so, when it reaches the cell that contains it, the program crashes.
As a temp fix, I have just set it to ignore the table if the header == whatever. Does anyone know of a solution that actually allows to extract this data though?

Load Multiple items from an excel sheet into a listview c#

I am desperately trying to add multiple items from an excel sheet into a listview using c#. I have looked all over the Internet for a working solution but still no result.
I would like to ask anybody who knows about c#'s listview for an helping hand...
Thanks in advance
code so far:-
public void InitializeListView(string path) {
Microsoft.Office.Interop.Excel.Application excel = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook sheet = excel.Workbooks.Open(path);
Microsoft.Office.Interop.Excel.Worksheet wx = excel.ActiveSheet as Microsoft.Office.Interop.Excel.Worksheet;
int count = 0;
int row = 0;
int col = 0;
Excel.Range userrange = wx.UsedRange;
count = userrange.Rows.Count;
statusBar1.Panels[1].Text = "Amount: " + count;
for (row = 1; row <= count; row++) {
for (col = 1; col <= 4; col++) {
listView1.Items.Add(wx.Cells[row, col].Value2);
listView1.Items.Add(wx.Cells[row, col].Value2);
listView1.Items.Add(wx.Cells[row, col].Value2);
listView1.Items.Add(wx.Cells[row, col].Value2);
}
}
sheet.Close(true, Type.Missing, Type.Missing);
excel.Quit();
}//------------------ end of InitializeListView -------------------------
This might be help you please see https://www.codeproject.com/Questions/460391/Retrieve-datas-from-Excel-Sheet-to-Listview
This is a simple method. Please look if it helps you.
1. Convert the Excel file in to .csv and Store it in the a Path
2. Take the data from .csv file to list.
3. Delete the .csv file once all data is loaded in List<>.
To Read from .csv
string filepath = "D:\\sample.csv";
var lineCount = File.ReadAllLines(#"D:\\sample.csv").Length;
int TotalLines = Int32.Parse(lineCount.ToString());
StreamReader sr = new StreamReader(filepath);
string line;
List<string> lstSample = new List<string>();
while ((line = sr.ReadLine()) != null)
{
lstSample = line.Split(',').ToList();
}

Export List to Excel

So I have a list of data that I am trying to export to excel. I just want to list it going down column 1 but it refuses. I was originally going to use a foreach loop but i was worried that would slow down my program and i wouldn't be able to use the for loop idea i had. Does anyone have any good ideas to just import this. I feel like it shouldn't be as hard as i am making it. This is what i have done so far. Thanks in advance.
if (dialog == DialogResult.Yes)
{
Microsoft.Office.Interop.Excel.Application excel = new Microsoft.Office.Interop.Excel.Application();
Workbook wb = excel.Workbooks.Add(XlSheetType.xlWorksheet);
Worksheet ws = (Worksheet)excel.ActiveSheet;
ws.Cells[1, 1] = "Folder Names";
for (int row = 0; row <= count; row++)
{
ws.Cells [1, row+2] = Namelist;
}
excel.Visible = true;
}
I want to go sequentially down the list as well. (the code above wont export Namelist, rest works though)
Namelist = list
int count (it is a counter i started earlier in the program to determine the number of lines of Namelist)
If Namelist is List<string>, the easiest way is to copy it to the Clipboard:
var text = "Folder Names\n" + string.Join("\n", Namelist); // or "\r\n"
System.Windows.Forms.Clipboard.SetText(text);
var xl = new Microsoft.Office.Interop.Excel.Application();
var wb = xl.Workbooks.Add();
var ws = xl.ActiveSheet as Worksheet;
ws.Range("A1").PasteSpecial();
xl.Visible = true;
or even easier because Excel is associated with .csv files by default:
var fileName = #"list.csv"; // or change to .xls and get warning message box
System.IO.File.WriteAllText(fileName, "Folder Names\n" + string.Join("\n", Namelist));
System.Diagnostics.Process.Start(fileName);
Update
CSV stands for Comma Separated Values, so if you want the list in a different column you have to add commas before the values. For example in columns 2 and 4:
,Folder Names,,Folder Size
,name1,,256
,name2,,"1,024"
If you have 2 lists with the same size, you can zip them together:
string[] names = {"name1", "name2"};
int[] sizes = {256, 1024};
var lines = names.Zip(sizes, (name, size) => name + "," + size); // {"name1,256", "name2,1024"}
var csv = "Names,Sizes\n" + string.Join("\n", lines);
The for loop won't slow down your program, but accessing the cells individually will. Each call to Cells is a COM-interop call, which is relatively expensive. It's much faster to put your data in an array, define a Range that represents the entire range of output, and set the Value there:
Microsoft.Office.Interop.Excel.Application excel = new Microsoft.Office.Interop.Excel.Application();
var wbs = excel.Workbooks;
Workbook wb = wbs.Add(XlSheetType.xlWorksheet);
Worksheet ws = (Worksheet)excel.ActiveSheet;
List<object> data = new List<object>
data.Add("Folder Names");
for (int row = 0; row <= count; row++)
{
data.Add(Namelist);
}
Excel.Range rng = (Excel.Range)ws.Range[ws.Cells[1, 1], ws.Cells[1,count + 2]];
rng.Value = data.ToArray();
excel.Visible = true;
Assuming Namelist is a string[] or List<string> what you're missing is the extraction of each item from the Namelist collection before setting the value of each cell:
ws.Cells[1, 1] = #"Folder names";
for(int row = 2; row <= count; row ++)
{
var name = Namelist[row-2];
ws.Cells[1, row] = name;
}
When you use the Cells property the first argument is the row, the second is the column. You have it reversed.
Also, if you haven't gone too far down the path of learning Excel interop, I would switch to EPPlus. It's 100x times easier to work with, doesn't involve messing with COM objects, and doesn't even require Excel. It's just better.
Super easy way to export your list to excel using c#
How to install ClosedXML with NuGet Packager Manager Console:
PM> Get-Project [ProjectName] | Install-Package ClosedXML
using (var conn = new DB.Entities())
{
var stories = (from a in conn.Subscribers
orderby a.DT descending
select a).Take(100).ToList();
var ShowHeader = true;
PropertyInfo[] properties = stories.First().GetType().GetProperties();
List<string> headerNames = properties.Select(prop => prop.Name).ToList();
var wb = new XLWorkbook();
var ws = wb.Worksheets.Add("Subscribers");
if (ShowHeader)
{
for (int i = 0; i < headerNames.Count; i++)
ws.Cell(1, i + 1).Value = headerNames[i];
ws.Cell(2, 1).InsertData(stories);
}
else
{
ws.Cell(1, 1).InsertData(stories);
}
wb.SaveAs(#"C:\Testing\yourExcel.xlsx");
}

how to create multiple excel sheets

well i have to creat just one excel file and 2 sheets both are fill using a 2 diferent DataTable, it gives the name the user only has to click save, the next code allows me to seend one datatable to one sheet (i am using C#, asp.net, and NOT using Visual Studio, i am writing in the Notepad my code):
string name2="Centroids";
HttpContext context = HttpContext.Current;
context.Response.Clear();
foreach (System.Data.DataRow row in _myDataTable2.Rows)
{
for (int i = 0; i < _myDataTable2.Columns.Count; i++)
{
context.Response.Write(row[i].ToString().Replace(",", string.Empty) + ",");
}
context.Response.Write(Environment.NewLine);
}
context.Response.ContentType = "text2/csv";
context.Response.AppendHeader("Content-Disposition", "attachment; filename=" + name2 + ".csv");
but i have no idea how to creat the second sheet and use the second DataTable, any ideas of how to find a solution, this way the user has only to save and donwload only one document and not save as many DataTable are in the programa
You probably want to explore the possibility of using EPPlus. In my experience, using Response object has lot of constraints and take too much development effort to generate Excel file.
Url:
http://epplus.codeplex.com/
You should use open source libraries for generating native excel files, there is no way you can create two sheets with csv.
Use NPOI (xls) or / and EPPlus (xlsx) and fully control your excel export, in this answer https://stackoverflow.com/a/9569827/351383 you can see example of creating excel file from DataTable with EPPlus. You can edit that method to accept DataTable list and create new sheets for each table, it's simple, just :
ExcelPackage pack = new ExcelPackage();
ExcelWorksheet ws = pack.Workbook.Worksheets.Add(sheetName);
public bool LlenarExcelxlsx(List<DatosEntidad> listaOrigen)
{
bool exito = false;
string[] tipoLista = { "A", "B", "C" };
string nombreArchivo = #"D:\prueba.xlsx";
IWorkbook wb = new XSSFWorkbook();
using (FileStream fileData = new FileStream(nombreArchivo, FileMode.Create, FileAccess.Write))
{
for (int k = 0; k < tipoLista.Length; k++)
{
List<DatosEntidad> listaDestino = listaOrigen
.Where(c => c.tipo == tipoLista[k]).ToList();
DataTable dt = ToDataTable(listaDestino);
ISheet sheetx = wb.CreateSheet("Res_" + tipoLista[k] + k);
ICreationHelper cH = wb.GetCreationHelper();
for (int i = 0; i < dt.Rows.Count; i++)
{
IRow row = sheetx.CreateRow(i);
for (int j = 0; j < 13; j++)
{
ICell cell = row.CreateCell(j);
cell.SetCellValue(cH.CreateRichTextString(dt.Rows[i].ItemArray[j].ToString()));
}
}
}
wb.Write(fileData);
exito = true;
}
return exito;
}

Categories

Resources