Limitation when exporting data to an Excel spreadsheet - c#

I know this question exists, because it's mine and I put up 500 bounty points on it:
Exporting C# report to Excel when there are more than 5K lines
The answer got me over the hump (to some degree) but we're sort of at the point where we just accept that abnormally large datasets just can't be exported via our ASP front end, so we ship those requests off to our SQL Server DBs, who then run the appropriate stored procedures and copy/paste to Excel spreadsheets.
My question here is; can someone definitively answer whether or not it's absolutely impossible to export a large dataset to an Excel spreadsheet via a ASP front end? Once a particular report hits about 8K records or something, it just can't seem to be done. I'm just trying to determine whether any other potential tweak can be made, or if that much data is just more than ASP can handle?

Well... since I've streamed gigabytes of data directly from ASP.NET, I'm pretty sure you're doing something wrong. Try to isolate the problem first - is it in putting the data into the session, is it request / response limits, is it request timeouts? Figure out where the problem is, and then go ahead and solve it! :)
In general terms, there's no reason why you should put the data in a DataSet first. Instead, use a SqlDataReader and write the data to output in chunks. This way you'll avoid having the whole data set in memory; the same way, you can just directly write to the output stream, without buffering the generated HTML in memory. Why do you keep data in Session? Wouldn't it be better to just hold the parameters necessary to retrieve it from the DB as needed, using the DataReader?
If you're having trouble with timeouts, periodical Flushes help. This also helps reduce the memory footprint on the ASP.NET side.
Saving the output data to a file on the server first also helps, and it allows you to wire up partial file downloads too - just make sure you actually have enough space on the drive.
EDIT:
Ok, so you've got an SqlCommand. Instead of using it in a SqlDataAdapter, you can do something like this (cmd being your SqlCommand instance):
HtmlTextWriter wr = new HtmlTextWriter(Response.Output);
using (var rdr = cmd.ExecuteReader())
{
int index = 0;
wr.WriteBeginTag("table");
wr.WriteLine("<tr><td>Column 1</td><td>Column 2</td></tr>");
while (rdr.Read())
{
wr.WriteBeginTag("tr");
wr.WriteBeginTag("td");
wr.Write(rdr["Column1"]);
wr.WriteEndTag("td");
wr.WriteBeginTag("td");
wr.Write(rdr["Column2"]);
wr.WriteEndTag("td");
wr.WriteEndTag("tr");
if (index++ % 1000 == 0) Response.Flush();
}
wr.WriteEndTag("table");
}
I have not tested it, so it might need some tweaking, but the idea should be pretty obvious.

It is possible to do this as I have actually just finished some code specifically to do this as part of a reporting project that I am working on where we have in-excess of 20K records that need to be pulled back and exported into excel.
I will pull out the code and stick it on github for you to look at.
I am actually using NPOI's excel processing package and then using my custom code I am able to process any List of classes dynamically into a dataset and then dump it into the worksheets.
I need to tidy up the code for you but I should have something ready for you this evening.
This code will work for both desktop and web apps.
To give you an idea my code has been able to process a dataset of over 30K relatively quickly. I have to resolve an issue with datasets over the limit of 65536 records first before it is ready for you.
The nice thing with this solution means it doesn't rely on excel being installed on the machine hosting the solution.
EDIT
I have loaded a project onto github here:
https://github.com/JellyMaster/ExcelHelper
but here is the main bit that does all the excel processing:
public static MemoryStream CreateExcelSheet(DataSet dataToProcess)
{
MemoryStream stream = new MemoryStream();
if (dataToProcess != null)
{
var excelworkbook = new HSSFWorkbook();
foreach (DataTable table in dataToProcess.Tables)
{
var worksheet = excelworkbook.CreateSheet();
var headerRow = worksheet.CreateRow(0);
foreach (DataColumn column in table.Columns)
{
headerRow.CreateCell(table.Columns.IndexOf(column)).SetCellValue(column.ColumnName);
}
//freeze top panel.
worksheet.CreateFreezePane(0, 1, 0, 1);
int rowNumber = 1;
foreach (DataRow row in table.Rows)
{
var sheetRow = worksheet.CreateRow(rowNumber++);
foreach (DataColumn column in table.Columns)
{
sheetRow.CreateCell(table.Columns.IndexOf(column)).SetCellValue(row[column].ToString());
}
}
}
excelworkbook.Write(stream);
}
return stream;
}
public static DataSet CreateDataSetFromExcel(Stream streamToProcess, string fileExtentison = "xlsx")
{
DataSet model = new DataSet();
if (streamToProcess != null)
{
if (fileExtentison == "xlsx")
{
XSSFWorkbook workbook = new XSSFWorkbook(streamToProcess);
model = ProcessXLSX(workbook);
}
else
{
HSSFWorkbook workbook = new HSSFWorkbook(streamToProcess);
model = ProcessXLSX(workbook);
}
}
return model;
}
private static DataSet ProcessXLSX(HSSFWorkbook workbook)
{
DataSet model = new DataSet();
for (int index = 0; index < workbook.NumberOfSheets; index++)
{
ISheet sheet = workbook.GetSheetAt(0);
if (sheet != null)
{
DataTable table = GenerateTableData(sheet);
model.Tables.Add(table);
}
}
return model;
}
private static DataTable GenerateTableData(ISheet sheet)
{
DataTable table = new DataTable(sheet.SheetName);
for (int rowIndex = 0; rowIndex <= sheet.LastRowNum; rowIndex++)
{
//we will assume the first row are the column names
IRow row = sheet.GetRow(rowIndex);
//a completely empty row of data so break out of the process.
if (row == null)
{
break;
}
if (rowIndex == 0)
{
for (int cellIndex = 0; cellIndex < row.LastCellNum; cellIndex++)
{
string value = row.GetCell(cellIndex).ToString();
if (string.IsNullOrEmpty(value))
{
break;
}
else
{
table.Columns.Add(new DataColumn(value));
}
}
}
else
{
//get the data and add to the collection
//now we know the number of columns to iterate through lets get the data and fill up the table.
DataRow datarow = table.NewRow();
object[] objectArray = new object[table.Columns.Count];
for (int columnIndex = 0; columnIndex < table.Columns.Count; columnIndex++)
{
try
{
ICell cell = row.GetCell(columnIndex);
if (cell != null)
{
objectArray[columnIndex] = cell.ToString();
}
else
{
objectArray[columnIndex] = string.Empty;
}
}
catch (Exception error)
{
Debug.WriteLine(error.Message);
Debug.WriteLine("Column Index" + columnIndex);
Debug.WriteLine("Row Index" + row.RowNum);
}
}
datarow.ItemArray = objectArray;
table.Rows.Add(datarow);
}
}
return table;
}
private static DataSet ProcessXLSX(XSSFWorkbook workbook)
{
DataSet model = new DataSet();
for (int index = 0; index < workbook.NumberOfSheets; index++)
{
ISheet sheet = workbook.GetSheetAt(index);
if (sheet != null)
{
DataTable table = GenerateTableData(sheet);
model.Tables.Add(table);
}
}
return model;
}
}
This does require the NPOI nuget package to be installed in your project.
Any questions give me a shout. The github project does a bit more but this is enough to get you going hopefully.

Related

How to add a progressbar and status to a process that doesnt have a count like copying files does?

So i had posted a question on getting my sample project working and it now works.. Sample Project
And thats great because as in that example i have a production project that requires copying files, so that will work great for that one.
But this question is about displaying a progress bar to a process that im not clear on how to implement.
Im reading a excel file using closedxml, i read that file into a datatable in order to perform some filtering and other things in order to populate some listboxes on my form, how can my sample code in my other post be implemented against the creation of 5 or 6 data tables?
I can provide some of the datatable creation methods, but the over all code is close to 600 lines right now and not finished yet.. so below is a stripped down sample of the current code im working with..
private void sample()
{
string plink = #"C:\Test\Sizes.xlsx";
string[] DistinctDept = { "Dept Code", "Dept Description" };
DataTable ListDept = GetDistinctRecords(LoadExceltoDatatable(plink), DistinctDept);
ListDept.Columns.Add(new DataColumn("DeptCombo", typeof(string), "'('+[Dept Code] +') ' + [Dept Description]"));
if (string.IsNullOrEmpty(ListDept.Rows[0]["Dept Code"].ToString()))
{
ListDept.Rows[0].Delete();
ListDept.AcceptChanges();
}
lbDept.DataSource = ListDept;
lbDept.DisplayMember = "DeptCombo";
lbDept.ClearSelected();
}
public static DataTable GetDistinctRecords(DataTable dt, string[] Columns)
{
DataTable dtUniqRecords = new DataTable();
dtUniqRecords = dt.DefaultView.ToTable(true, Columns);
return dtUniqRecords;
}
public static DataTable LoadExceltoDatatable(string sizeoptcalc)
{
using (var wb = new XLWorkbook(sizeoptcalc, XLEventTracking.Disabled))
{
var ws = wb.Worksheet(1);
var foundMonth = ws.Search("Month", System.Globalization.CompareOptions.OrdinalIgnoreCase);
var monthRow = foundMonth.Last().Address; // A11
var lastcell = ws.LastCellUsed().Address; // BC3950
DataTable dataTable = ws.Range(monthRow, lastcell).RangeUsed().AsTable().AsNativeDataTable();
return dataTable;
}
}
Can this be changed to report the progress? I mean in some cases the excel files are large and do take some time to fill in my listboxes.
Here are more of the datatable creations that i would like to account for the overall progress of them

How to search for empty cells in a specific column in a CSV file for all rows?

Goal: Search through a specific single column in a CSV for empty rows only in that column and replace with string "No Box".
Attempts: So far I have tried to use CsvHelper and CsvTools(CsvReader) via Nuget C#. I am not very experienced with C# so not sure how to accomplish my task. Searching did not turn up any examples or references that helped me understand what I need to implement. There are a lot of similar questions, but none of them searched a specific column. I am hoping someone can provide me with advice on how to get my for loop to work and get the number of rows for my checking.
Image sample of my CSV file.
Sample of CSV data column Site
private static void SiteBlanks()
{
try
{
MutableDataTable dt = DataAccess.DataTable.New.ReadCsv(#"C:\temp.csv");
for (int i = 0; i <= dt.Rows.Count; i++) // Cannot be applied to data types, so this errors.
{
if (!string.IsNullOrEmpty(dt.GetRow(i)["Site"])) // Check if cells in column 1 are empty
{
dt.Columns[1].Values[i] = "No Box"; // Update empty values with No Box
}
}
dt.SaveCSV(#"C:\temp.csv"); // Save file after changes.
}
catch (Exception ex)
{
//Set Error message
Error("ERROR: SiteBlanks()", ex);
}
}
Note: This is my first question ever asked so be gentle and tell me what I may have did wrong posting wise.
Based on your current code you can try the following
private static void SiteBlanks() {
try {
string filePath = #"C:\temp.csv";
MutableDataTable dt = DataTable.New.ReadCsv(filePath);
string columnName = "Site";
var numberOfRows = dt.NumRows ;
for (int i = 0; i < numberOfRows; i++) {
var row = dt.GetRow(i);
if (string.IsNullOrEmpty(row[columnName])) {
row[columnName] = "No Box";
}
}
dt.SaveCSV(filePath);
} catch (Exception ex) {
//Set Error message
Error("ERROR: SiteBlanks()", ex);
}
}

DataTable Remove Rows Not Reflected in SQL Server

I'm fairly new to C# and this has me stumped. My project is using DataTables and TableAdapters to connect to a SQL Server database. I have a method that opens Excel, builds a DataRow and then passes that to the method below which adds it to my DataTable (cdtJETS) via the TableAdapter (ctaJETS).
public bool AddJETSRecord(DataRow JETSDataRow)
{
bool bolException = false;
cdtJETS.BeginLoadData();
// Add the data row to the table
try
{
cdtJETS.ImportRow(JETSDataRow);
}
catch (Exception e)
{
// Log an exception
bolException = true;
Console.WriteLine(e.Message);
}
cdtJETS.EndLoadData();
// If there were no errors and no exceptions, then accept the changes
if (!cdtJETS.HasErrors && !bolException)
{
ctaJETS.Update(cdtJETS);
return true;
}
else
return false;
}
The above works fine and the records show up in SQL Server as expected. I have another method that grabs a subset of the records in that DataTable and outputs them to another Excel file (this is a batch process that will collect records over time using the above method and then occasionally output them, so I can't directly move the data from the first Excel file to the second). After the second Excel file is updated I want to delete the records from the table so that they aren't duplicated the next time the method is run. This is where I'm having the issue:
public bool DeleteJETSRecords(DataTable JETSData)
{
int intCounter = 0;
DataRow drTarget;
// Parse all of the rows in the JETS Data that is to be deleted
foreach (DataRow drCurrent in JETSData.Rows)
{
// Search the database data table for the current row's OutputID
drTarget = cdtJETS.Rows.Find(drCurrent["OutputID"]);
// If the row is found, then delete it and increment the counter
if (drTarget != null)
{
cdtJETS.Rows.Remove(drTarget);
intCounter++;
}
}
// Continue if all of the rows were found and removed
if (JETSData.Rows.Count == intCounter && !cdtJETS.HasErrors)
{
cdtJETS.AcceptChanges();
try
{
ctaJETS.Update(dtJETS);
}
catch (Exception)
{
throw;
}
return true;
}
else
cdtJETS.RejectChanges();
return false;
}
As I step through the method I can see the rows being removed from the DataTable (i.e. if JETSData has 10 rows, at the end cdtJETS has n-10 rows) and no exceptions are thrown, but after I AcceptChanges and Update the TableAdapter, the underlying records are still in my SQL Server table. What am I missing?
The Rows.Remove method is equivalent to calling the row's Delete method, followed by the row's AcceptChanges method.
As with the DataTable.AcceptChanges method, this indicates that the change has already been saved to the database. This is not what you want.
The following should work:
public bool DeleteJETSRecords(DataTable JETSData)
{
int intCounter = 0;
DataRow drTarget;
// Parse all of the rows in the JETS Data that is to be deleted
foreach (DataRow drCurrent in JETSData.Rows)
{
// Search the database data table for the current row's OutputID
drTarget = cdtJETS.Rows.Find(drCurrent["OutputID"]);
// If the row is found, then delete it and increment the counter
if (drTarget != null)
{
drTarget.Delete();
intCounter++;
}
}
// Continue if all of the rows were found and removed
if (JETSData.Rows.Count == intCounter && !cdtJETS.HasErrors)
{
// You have to call Update *before* AcceptChanges:
ctaJETS.Update(dtJETS);
cdtJETS.AcceptChanges();
return true;
}
cdtJETS.RejectChanges();
return false;
}

C# Viewstate - cant retrieve table

I have a datatable containing file paths which I am passing via viewstate (referencing, via a linkbutton, an index in this table), wanting to then use the path from the table to construct a HTTP filetransfer. (So 3 cols; name, path and index)
I am unable to successfully retrieve the datatable once saved in viewstate;
ViewState["varFiles"] = filedata;
(When page is originally constructed, then after postback:)
if (!IsPostBack) { SetupSession(); newpopfiles(); }
else { { if (ViewState["varFiles"] != null) { DataTable filedata = new DataTable(); filedata = (DataTable)Session["varFiles"]; } } }
From what I understand this should pull back filedata as a table in exactly the same form as before postback. Is this correct?
When subsequently referencing the table I get a null reference exception. Any ideas?
Many thanks,
Dan
It sounds like you're almost there, just need to be a bit more consistent with using the same storage mechanism :)
The bit to save the DataTable into your session, probably in OnInit() or PageLoad():
DataTable myDataTable = //... fill it in somehow
Session["varFiles"] = myDataTable;
The bit to read the DataTable after postback:
if (!IsPostBack)
{
SetupSession();
newpopfiles();
}
else
{
DataTable filedata = Session["varFiles"] as DataTable;
if (filedata != null)
{
//... do something
}
}

C# Downloadable Excel Files from Class Library

I'm looking for some advice. I'm building on an additional feature to a C# project that someone else wrote. The solution of the project consists of an MVC web application, with a few class libraries.
What I'm editing is the sales reporting function. In the original build, a summary of the sales reports were generated on the web application. When the user generates the sales report, a Reporting class is called in one of the C# class libraries. I'm trying to make the sales reports downloadable in an excel file when the user selects a radio button.
Here is a snippet of code from the Reporting class:
public AdminSalesReport GetCompleteAdminSalesReport(AdminSalesReportRequest reportRequest)
{
AdminSalesReport report = new AdminSalesReport();
string dateRange = null;
List<ProductSale> productSales = GetFilteredListOfAdminProductSales(reportRequest, out dateRange);
report.DateRange = dateRange;
if (titleSales.Count > 0)
{
report.HasData = true;
report.Total = GetTotalAdminSales(productSales);
if (reportRequest.Type == AdminSalesReportRequest.AdminSalesReportType.Complete)
{
report.ProductSales = GetAdminProductSales(productSales);
report.CustomerSales = GetAdminCustomerSales(productSales);
report.ManufacturerSales = GetAdminManufacturerSales(productSales);
if (reportRequest.Download)
{
FileResult ExcelDownload = GetExcelDownload(productSales);
}
}
}
return report;
}
So as you can see, if reportRequest.Download == true, the class should start up the process of creating the excel file. All the GetAdminSales functions do it use linq queries to sort out the sales if they are being displayed on the webpage.
So I have added this along with the GetAdminSales functions:
private FileResult GetExcelDownload(List<TitleSale> titleSales)
{
CustomisedSalesReport CustSalesRep = new CustomisedSalesReport();
Stream SalesReport = CustSalesRep.GenerateCustomisedSalesStream(productSales);
return new FileStreamResult(SalesReport, "application/ms-excel")
{
FileDownloadName = "SalesReport" + DateTime.Now.ToString("MMMM d, yyy") + ".xls"
};
}
and to format the excel sheet, I'm using the NPOI library, and my formatter class is laid out like so:
public class CustomisedSalesReport
{
public Stream GenerateCustomisedSalesStream(List<ProductSale> productSales)
{
return GenerateCustomisedSalesFile(productSales);
}
private Stream GenerateCustomisedSalesFile(List<ProductSale> productSales)
{
MemoryStream ms = new MemoryStream();
HSSFWorkbook templateWorkbook = new HSSFWorkbook();
HSSFSheet sheet = templateWorkbook.CreateSheet("Sales Report");
HSSFRow dataRow = sheet.CreateRow(0);
HSSFCell cell = dataRow.CreateCell(0);
cell = dataRow.CreateCell(0);
cell.SetCellValue(DateTime.Now.ToString("MMMM yyyy") + " Sales Report");
dataRow = sheet.CreateRow(2);
string[] colHeaders = new string[] {
"Product Code",
"Product Name",
"Qty Sold",
"Earnings",
};
int colPosition = 0;
foreach (string colHeader in colHeaders)
{
cell = dataRow.CreateCell(colPosition++);
cell.SetCellValue(colHeader);
}
int row = 4;
var adminTotalSales = GetAdminProductSales(productSales);
foreach (SummaryAdminProductSale t in adminTotalSales)
{
dataRow = sheet.CreateRow(row++);
colPosition = 0;
cell = dataRow.CreateCell(colPosition++);
cell.SetCellValue(t.ProductCode);
cell = dataRow.CreateCell(colPosition++);
cell.SetCellValue(t.ProductName);
cell = dataRow.CreateCell(colPosition++);
cell.SetCellValue(t.QtySold);
cell = dataRow.CreateCell(colPosition++);
cell.SetCellValue(t.Total.ToString("0.00"));
}
}
templateWorkbook.Write(ms);
ms.Position = 0;
return ms;
}
Again like before, the GetAdminSales (GetAdminProductSales, etc) are contained in the bottom of the class, and are just linq queries to gather the data.
So when I run this, I don't get any obvious errors. The summary sales report appears on screen as normal but no excel document downloads. What I have done, which may be putting this off is in my class library I have referened the System.Web.Mvc dll in order to download the file (I have not done it any other way before - and after reading up on the net I got the impression I could use it in a class library).
When I debug through the code to get a closer picture of what's going on, everything seems to be working ok, all the right data is being captured but I found that from the very start - the MemoryStream ms = new Memory Stream declaration line in my formatter class shows up this (very hidden mind you) :
ReadTimeout '((System.IO.Stream)(ms)).ReadTimeout'
threw an exception of type
'System.InvalidOperationException' int
{System.InvalidOperationException}
+{"Timeouts are not supported on this stream."} System.SystemException
{System.InvalidOperationException}
I get the same for 'WriteTimeout'...
Apologies for the long windedness of the explaination. I'd appreciate it if anyone could point me in the right direction, either to solve my current issue, or an alternative way of making this work.
Without getting bogged down in the details, the obvious error is that in GenerateCustomisedSalesFile you create a MemoryStream ms, do nothing with it, then return it.

Categories

Resources