How do i export only the table contents to excel file through C# programming?
I am currently extracting all the contents from PDFs using PDFNET SDK ,but couldn't able to read the table as a tabular structure
I know have not used the SDK for this product, but I have used the stand alone product. It read the content of a PDF into a spreadsheet (many export options).
The product is OmniPage by Nuance http://australia.nuance.com/for-business/by-product/omnipage/index.htm.
there is an SDK with free evaluation.
Using bytescount PDF Extractor SDK we can be able to extract the whole page as below,
CSVExtractor extractor = new CSVExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";
TableDetector tdetector = new TableDetector();
tdetector.RegistrationKey = "demo";
tdetector.RegistrationName = "demo";
// Load the document
extractor.LoadDocumentFromFile("C:\\sample.pdf");
tdetector.LoadDocumentFromFile("C:\\sample.pdf");
int pageCount = tdetector.GetPageCount();
for (int i = 1; i <= pageCount; i++)
{
int j = 1;
do
{
extractor.SetExtractionArea(tdetector.GetPageRect_Left(i),
tdetector.GetPageRect_Top(i),
tdetector.GetPageRect_Width(i),
tdetector.GetPageRect_Height(i)
);
// and finally save the table into CSV file
extractor.SavePageCSVToFile(i, "C:\\page-" + i + "-table-" + j + ".csv");
j++;
} while (tdetector.FindNextTable()); // search next table
}
since it is an old post, hope it would help others.
Above answer(John) works,it is really useful.
But i use bytescount PDF Extrator SDK tools instead of using code.
By the way,the tool will generate a lot of sheet in one excel file.
You can use code below in excel to generate as one sheet.
Sub ConvertAsOne()
Application.ScreenUpdating = False
For j = 1 To Sheets.Count
If Sheets(j).Name <> ActiveSheet.Name Then
X = Range("A65536").End(xlUp).Row + 1
Sheets(j).UsedRange.Copy Cells(X, 1)
End If
Next
Range("B1").Select
Application.ScreenUpdating = True
MsgBox "succeed!", vbInformation, "note"
End Sub
Related
Using C# .net core I am updating existing excel template with Data and formulas using EPPlus lib 4.5.3.3.
If you see the below screen shots all formula cells has '#value!' even after using calculate method in C# code (Just for reference attached xml screen short just after downloading excel before opening it). Auto calculation is also enabled in Excel.
In one of the blog mentioned to check the xml info,
My requirement is to upload this excel through code to sharepoint site and read the excel formula cells for other operations with out opening the excel manually.
is there any other way to calculate the formula cells form code and update the cell values?
I went through the Why won't this formula calculate unless i double click a cell? as well, but no luck.
using (ExcelPackage p = new ExcelPackage())
{
MemoryStream stream = new MemoryStream(byteArray);
p.Load(stream);
ExcelWorksheet worksheet = p.Workbook.Worksheets.FirstOrDefault(a => a.Name == "InputTemplate");
worksheet.Calculate();
if (worksheet != null)
{
worksheet.Cells["A3"].Value = company.CompanyName;//// Company Name
worksheet.Cells["B3"].Value = product.Name;////peoduct name
worksheet.Cells["C3"].Value = product.NetWeight;
worksheet.Cells["D3"].Value = product.ServingSize;
worksheet.Cells["E3"].Value = 0;
var produceAndIngredientDetailsForExcelList = await GetProduceAndIngredientDetails(companyId, productId);
////rowIndex will be 3
WriteProduceAndIngredientDetailsInExcel(worksheet, produceAndIngredientDetailsForExcelList);
///rowIndex will update based on no. of produce and then Agregates.
StageWiseAggregate(worksheet, produceAndIngredientDetailsForExcelList);
////Write Total Impacts Row
TotalImpactsFormulaSection(worksheet);
worksheet.Calculate();
}
Byte[] bin = p.GetAsByteArray();
return bin;
}
Formula Code
var columnIndex = 22;///"V" Column
for (; columnIndex <= 27; columnIndex++)
{
var columnName = GetExcelColumnName(columnIndex);
worksheet.Cells[currentRowIndex, columnIndex].Formula = $"=SUBTOTAL(109,{columnName}{firstRowIndex}:{columnName}{currentRowIndex - 1})";
}
Found the solution for this issue from my Architect (kudos to him).
I was writing formulas in wrong way by blindly fallowing tutorials like
https://riptutorial.com/epplus/example/26433/add-formulas-to-a-cell
Note: don't follow link shown above.
We should not use "=" for formulas. I just removed it worked like charm
var columnIndex = 22;///"V" Column
for (; columnIndex <= 27; columnIndex++)
{
var columnName = GetExcelColumnName(columnIndex);
worksheet.Cells[currentRowIndex, columnIndex].Formula = $"SUBTOTAL(109,{columnName}{firstRowIndex}:{columnName}{currentRowIndex - 1})";
}
Here is the official tutorial which mentioned correctly.
https://www.epplussoftware.com/en/Developers/ (check the second slide)
Working result:
Hello sorry for my english.
I have to select a row of a excel file, put any new data and save them.
In the end I see that the excel file is always larger than before although the data are not increased but it looks to be created of the blank columns to the right.
I think this because when I execute the following statement
var wb = openWorkBook(filename);
var ws = wb.Worksheet("CNF");
IXLRow row = ws.Row(device.Ordinal - 1 + FirstRow);
for (int j = 0; j < MAXCOLS; ++j)
{
IXLCell cell = row.Cell(j + FirstCol);
...}
as range goes from A1 to XFD1048576.
Although after I take the line of my interest and cycle of 100 columns when I go
wb.Save();
the file increases.
So I ask you if you have a method to take only a part of a file then for example take already suffered from a limited number of columns, starting from education var ws = wb.Worksheet("CNF");.
Thank you
I have been trying to make a simple app to merge information from one excel spreadsheet to another in c#. But I don't find any reference about how could I do this.
enter image description here
I have the info in one spreadsheet and I need to copy that information in another spreadsheet file.
enter image description here
How can do this?
Thanks in advance.
Here is another thing you may want to try out (the code uses GemBox.Spreadsheet library):
ExcelFile source = ExcelFile.Load("Source.xlsx");
ExcelColumn sourceColumn = source.Worksheets[0].Columns[0];
ExcelFile destination = ExcelFile.Load("Destination.xlsx");
ExcelColumn destinationColumn = destination.Worksheets[0].Columns[0];
int count = source.Worksheets[0].Rows.Count;
for (int i = 0; i < count; i++)
destinationColumn.Cells[i].Value = sourceColumn.Cells[i].Value;
destination.Save("Destination.xlsx");
After trying few packages with git hub, and trying to parse/process this quite a large excel document.
Each one of methods I was trying throw exception on out of memory.
I was google ing some more and found this GNU Library named koogra which seems to be only one I could see fit for the job, couldn't bother too much and continue on searching as I am running out of time for this part of the project .
The code I have got by now is working pass the part of the "out of memory" issue,
so only thing left is how do I properly parse an Excel Document so it will be possible to extract say a kind of dictionary collection key is one column and value is another.
this is the file in question
this is the code i have so far
var path = Path.Combine(Environment.CurrentDirectory, "tst.xlsx");
Net.SourceForge.Koogra.Excel2007.Workbook xcel = new Net.SourceForge.Koogra.Excel2007.Workbook(path);
var ss = xcel.GetWorksheets();
found it by some more .... google ing...
first row for usage on 2007 (xlsx)
second row is for xls version
Net.SourceForge.Koogra.IWorkbook genericWB = Net.SourceForge.Koogra.WorkbookFactory.GetExcel2007Reader("tst.xlsx");
//genericWB = Net.SourceForge.Koogra.WorkbookFactory.GetExcelBIFFReader("some.xls");
Net.SourceForge.Koogra.IWorksheet genericWS = genericWB.Worksheets.GetWorksheetByIndex(0);
for (uint r = genericWS.FirstRow; r <= genericWS.LastRow; ++r)
{
Net.SourceForge.Koogra.IRow row = genericWS.Rows.GetRow(r);
for (uint c = genericWS.FirstCol; c <= genericWS.LastCol; ++c)
{
// raw value
Console.WriteLine(row.GetCell(c).Value);
// formatted value
Console.WriteLine(row.GetCell(c).GetFormattedValue());
}
}
i hope that i helped anyone else out there that encountered same "out of memory" issue ... '
enjoy
a small update to the code above
OK.. I Have played with this a little , so as far as it is related to the content of the file
the chart is ranked based on Unique IP and the current code is
//place source file within your current:
//project directory\bin\debug and you should find extracted file next to the source file
var pathtoRead = Path.Combine(Environment.CurrentDirectory, "tst.xlsx");
var pathtoWrite = Path.Combine(Environment.CurrentDirectory, "tst.txt");
Net.SourceForge.Koogra.IWorkbook genericWB = Net.SourceForge.Koogra.WorkbookFactory.GetExcel2007Reader(pathtoRead);
Net.SourceForge.Koogra.IWorksheet genericWS = genericWB.Worksheets.GetWorksheetByIndex(0);
StringBuilder SbXls = new StringBuilder();
for (uint r = genericWS.FirstRow; r <= genericWS.LastRow; ++r)
{
Net.SourceForge.Koogra.IRow row = genericWS.Rows.GetRow(r);
string LineEnding = string.Empty;
for (uint ColCount = genericWS.FirstCol; ColCount <= genericWS.LastCol; ++ColCount)
{
var formated = row.GetCell(ColCount).GetFormattedValue();
if (ColCount == 1)
LineEnding = Environment.NewLine;
else if (ColCount == 0)
LineEnding = "\t";
if (ColCount > 1 == false)
SbXls.Append(string.Concat(formated, LineEnding));
}
}
File.WriteAllText(pathtoWrite, SbXls.ToString());
I wanted to ask if there is some practical way of adding multiple hyperlinks in excel worksheet with C# ..? I want to generate a list of websites and anchor hyperlinks to them, so the user could click such hyperlink and get to that website.
So far I have come with simple nested for statement, which loops through every cell in a given excel range and adds hyperlink to that cell:
for (int i = 0; i < _range.Rows.Count; i++)
{
Microsoft.Office.Interop.Excel.Range row = _range.Rows[i];
for (int j = 0; j < row.Cells.Count; j++)
{
Microsoft.Office.Interop.Excel.Range cell = row.Cells[j];
cell.Hyperlinks.Add(cell, adresses[i, j], _optionalValue, _optionalValue, _optionalValue);
}
}
The code is working as intended, but it is Extremely slow due to thousands of calls of the Hyperlinks.Add method.
One thing that intrigues me is that the method set_Value from Office.Interop.Excel can add thousands of strings with one simple call, but there is no similar method for adding hyperlinks (Hyperlinks.Add can add just one hyperlink).
So my question is, is there some way to optimize adding hyperlinks to excel file in C# when you need to add a large number of hyperlinks...?
Any help would be apreciated.
I am using VS2010 and MS Excel 2010.
I have the very same problems (adding 300 hyperlinks via Range.Hyperlinks.Add takes approx. 2 min).
The runtime issue is because of the many Range-Instances.
Solution:
Use a single range instance and add Hyperlinks with the "=HYPERLINK(target, [friendlyName])" Excel-Formula.
Example:
List<string> urlsList = new List<string>();
urlsList.Add("http://www.gin.de");
// ^^ n times ...
// create shaped array with content
object[,] content = new object [urlsList.Count, 1];
foreach(string url in urlsList)
{
content[i, 1] = string.Format("=HYPERLINK(\"{0}\")", url);
}
// get Range
string rangeDescription = string.Format("A1:A{0}", urlsList.Count+1) // excel indexes start by 1
Xl.Range xlRange = worksheet.Range[rangeDescription, XlTools.missing];
// set value finally
xlRange.Value2 = content;
... takes just 1 sec ...