copy in a single spreadsheet tables from different spreadsheets - c#

I have different excel spreadsheets that contain tables with the same structure and the same header.
I need to conserve the header in the first spreadsheet and copy the tables of the other spreadsheets one beside the other in the first one, without repeating the header every time.
I have 150 or more spreadsheets, it takes a long time to do it by hand.
How can I solve this problem with a programming language like Python, C# or excel VBA?
I saw similar questions but I didn't manage to solve this problem with the answers given to those questions.
Question 1
Question 2

I would probably automate the process using whichever programming language you are most comfortable with that also has a pre-made Excel package. For instance, in C#, I used to use EPPlus.
Essentially you would write code that opens the existing first spreadsheet for writing, opens the other spreadsheets one by one and reads the table content starting from row 2, and copies the data into your master one. Rinse and repeat.
I would even clone the original spreadsheet and rename it to something else while you are testing this so that you don't accidentally clobber anything.

Related

Need to automate crunching of excel data?

I have processes running on Windows XP/7. They generate weekly .csv data files. I have a bunch of excel formulas that crunch the numbers for each .csv file produced for the week separately and then when adding the weekly data to the one big spreadsheet containing all the data put together.
The number of rows varies each week and for each process. So I can't hardcode that number in my dozens of formulas. So right now I go through this stupid process of manually entering the formulas each week into the .csv files.
There's got to be a way of automating this. Just now I quickly looked into doing this through C# or VB code. Could somebody recommend the best way to do this. Is C# or VB the right way to go? If so, any hints on how to put it all together - what's the model to use? For example, would it look something like this:
C# module reads in .csv data file
C# module creates an Excel spreadsheet and populates it with the .csv data
C# module runs my formulas on the all the rows.
Is that how one would approach it? Is there a better way for somebody who has very limited knowledge of C# or VB? I know Java and C++.
Any advice would be highly appreciated.
Thanks
From your explanations in comments, it appears that having a series of template Excel sheets would greatly facilitate the task.
So, for each process that generates data, you say the formulas are always the same, meaning that the columns are always the same (am I right?).
So, even if you don't know how many rows of data, you can still either create a template where only the first row is filled with formulas, and then you simply copy that row over and over, filling it with data as needed, or, you could fill a relatively "comfortable" number of rows with those same formulas, and fill in the data.
There are tons of atricles on how to Interop with Excel, so it's beyond my intent to provide you with specific code, but the idea is good.
If I can allow myself, I have worked in the past with a very interesting tool call Flexcel Studio for .NET, and I have found it to be of great help when it came to generating Excel sheets based on such templates.
Cheers
As others have suggested, I would recommend performing the calculations outside of excel if possible. There are plenty of stats libraries out there that are friendlier to work with than going through the hassle of moving data into excel, applying formulas to cell ranges, and so on.
If you really want to go the excel route, you can either use open-source libraries such as EPPLUS (.NET) or POI (Java) to work with .XLSX files directly. Some libraries do not support function evaluation so you will need to consider this when deciding on a library to use.
If you go with COM interop, you should read about about the following: Considerations for server-side Automation of Office.
As for the C# or VB (if not java with POI), I would go with C#. C# syntax is similar to java.
There might be a really simple solution to this problem.
Add 1 piece of auxiliary data to the .csv file either programmatically when running my process or when creating the .xlsx file (with all the formulas) from the .csv file. The auxiliary piece of data is the row count which will be in some known location.
Then modify all my formulas to use the INDIRECT function to specify the range using the cell
with the auxiliary piece of data.
I think that might work.

C# program runs too long mapping to an excel sheet

I have a C# program that takes a legacy report file and maps to an Excel. It was running ok but we changed the process. The legacy program groups all the detail rows together and my program breaks page rather than before when they provided all the pages. this had added to run time about 4 times as long.
I have been told that if I can manually modify Excel to create 'proper' output for input file, It can speed things up a good bit.
also go to an Excel code-behind or add-in which would run from Dxcel and thus be faster
Can someone direct me how to apply these 2 ideas?
We do the code now as an array and write the entire row rather than cell by cell.
here is copy of the code: http://www.mediafire.com/?cebg17u5wl0ir25
Automation of Office applications is generally very slow. I just encountered this problem while trying to create a complicated graphic with Visio form c# code. It lasted about 30s. Now i create a SVG-File, that is then opened in Visio. Creating the SVG-File lasts less than 1s now!
I suggest that you export your data as CSV-File and then import it into Excel. Do only the minimum, i.e. the creation of worksheets, the import of the CSV and the formatting, with Excel-automation.

How to import the data from an Excel spreadsheet so it can be manipulated in C#

I have an excel spread sheet (well, hundreds of them) which I need importing into a database.
If the excel data was in a nice uniform format I would simply save them out to CSV, read them in using something like LINQ to CSV and save the required data away.
However, the excel spread sheet is 'uneven' in that different groups of cells contain different data.
I need a way of grabbing the data and then working with cell references to grab the bits I need and save them to the database.
What's the best way to achieve this?
Thanks
UPDATE some more information
I have numerous spread sheets, all identical in structure that need to be imported into a database. The import is not simple in that different chunks of data from the spread sheet will go into different tables. The excel document itself contains a few sections (basically question / answer) type data. For each different section I need to grab the data, shape it into a form that makes sense in terms of the database and save it.
Ideally I would like to create a quick little WPF app that will let me select a spread sheet hit a button and perform the import.
You could use the Excel Object Model to read the data if you do it in a non web environment.
See for example How to automate Microsoft Excel from Microsoft Visual C#.NET.
If it has to be inside a web application. I suggest to use Aspose Cells.
Turn the Excel Spread sheet into an ODBC (Open Database Connectivity) Data Source so you can access it just like you would any database:
http://www.datamystic.com/datapipe/excel_odbc.html
Then access it just like any database using ODBC:
http://msdn.microsoft.com/en-us/library/system.data.odbc.odbcconnection(v=vs.71).aspx
When the data is not uniform, it is often better to keep your approach as simple as possible in the first instance. Start with vba and the "Range" object (which is part of the excel object heirarchy). From there you can increase the level of automation and in most instances reuse this "Range" work.
avariable = Range("A2:A5")
That notiation is not going to change very much. It wont matter what final target language you use (language: C# / vba / etc).
There are a number of other ways of going about this -- java based / xml based / c# based / and a few other really cool ones that only apply to certain niche situations. If you can provide more information about your use case, then perhaps I can suggest some more things to try.
Q & A
example link for automation from C#: http://support.microsoft.com/kb/302084
You should probably take a look at Microsoft's Visual Studio Tools For Office (VSTO), which handles a lot of the unpleasant COM/interop stuff for you.
To those who may be interested I ended up using LinqToExcel:
http://code.google.com/p/linqtoexcel/
Did exactly what I was after with minimal fuss. Excellent

Methods to compile an Excel spreadsheet into a .NET assembly?

I have a spreadsheet that I'd like to compile into a form that I could call from C#.
Naturally, I'd like to be able to change the inputs to the spreadsheet before reading the calculated result.
What is your recommended method?
UPDATE:
To clarify, I want to make an existing Excel spreadsheet available as a web service that is callable from .NET. I can't have a dependency on Excel, as its running on a web server.
UPDATE:
I used the answer below, and it worked like a charm. Now I can prototype a formula in Excel, then convert it straight into C# and compile it into an assembly.
This question is also covered under Reading Excel Files as a Server Process.
FlexCel API Mate within TMS Flexcel Studio for .NET lets you convert an existing Excel spreadsheet into C# code, recalculate the spreadsheet, and read the result out of a cell using an API call.
See the video tutorial of FlexCel ApiMate. The video states, quote:
ApiMate will convert an Excel file
into a C#, VB.NET or Delphi.NET
program.
The docs also state:
Recalculation of more than 200 Excel
functions.
and:
You can add your own functions on the
code to the already big list
implemented by FlexCel, and use them
as native functions in your report.
UPDATE
Here is clarification from TMS tech support:
Emailed question:
I'd like to do the following:
Convert an existing .xlsx file to C# code, importing data from a database.
Allow FlexCel to recalculate the spreadsheet for me.
Read an answer out of a cell (for use elsewhere in my C# code).
Skip the step of writing the finished .xlsx file to the disk (we don't need this).
In short, I want to use FlexCel as an "Webserver Excel calculation engine", so we don't have to have Excel installed on the web server to perform spreadsheet calculations.
Are the steps I've described possible? Or have I misunderstood how the component works?
Emailed reply:
You can either load the file directly from the database (by opening from a stream) or use the APIMate tool (incuded in the tools folder) to convert the file to c# code.
Yes, FlexCel will recalculate it with XlsFile.Recalc()
Yes, you can read the recalculated values too.
Yes, you don't need to write the answer if you don't want to.
Besides this, for using it as recalculation engine, we have the "RecalculateCell()" method that won't recalculate the full spreadsheet, but only the cells needed to get the value in an specific cell. So, if for example your result is in A1, you can call RecalcCell in A1, and it will recalculate only all cells needed to get the value in A1 (including dependecies, so if A1 has a formula with a2, and a2 with a3, all 3 will be calculated).
There is also a RecalcExpression method, that will recalculate the value of any formula without needing to write it into a cell. So imagine you have a column of numbers at col A, and you want to know the sum. You could use RecalcExpression("=sum(A:A)"); to know the sum, without needing to enter a formula in B1 with the sum and then reading the value of that formula (which you could also do of course)
From Microsoft, there appears to be a framework called Excel Services "Develop A Calculation Engine For Your Apps"
Teaser excerpt:
This article discusses:
Excel as a server-based application
The Excel Services architecture and APIs
Creating managed user-defined functions
Building custom solutions with Excel Services
I have never used it, but the info-graphics on the main page are most encouraging.
Thanks for asking this :)
Calc4web converts spreadsheets into C++ code, which can be called from C#, Java, etc.
Quote from website:
Calc4Web gives programmers a better
way to get their job done: create a
small spreadsheet which holds the
logic, and push a button to turn that
spreadsheet into C++ code that works
on the first try, code that can be
called from any language: C,C++, C#,
Visual Basic, Java, and any other
language which can call into DLLs.
I also suspect that since it compiles the spreadsheet logic to purely native C++ code, it will be very fast compared to Excel (the website states "5,000 times faster").
Check out ActiveMesa X2C, a tool for converting Excel spreadsheets into C# and C++. (Disclaimer: I'm the author.)
For a list of Spreadsheet components that allow you to work with Excel spreadsheets without having a dependency on Excel, see DevDirect Spreadsheet Components.
SyncFusion Essential Calculate.
Quote:
... you can fully load,
manipulate, and compute Excel
spreadsheets with no dependence on
Excel.
There is no way to compile an excel spreadsheet into c# code.They are not in any way the same "thing", In the same way you can't compile this text that you are reading, or a word document into code or an assembly of some kind.
You have to be more specific with the kind of functionality you want to get, which will help you get an answer.
There are several ways (pointed out in other answers here) to allow you to access Excel spreadsheet data and utilize them, but this is in no way compiling them.

Open a single worksheet (single tab) from a huge excel file on a web browser using c# asp.net / MVC

I have huge excel files that I have to open from web browser. It takes several minutes to load huge file. Is it possible to open a single worksheet (single tab) at a time from excel file that contains many worksheets? I have to do this using C# / asp.net MVC
I'm assuming you have the excel workbook on the server and just want to send a single worksheet to the client. Does the user then edit the worksheet? Will they be uploading it back?
Assuming this is just a report then why not use the OpenXML sdk to read the workbook, extrac the sheet in question and send it back to the client? This is what #Jim in the comments was suggesting. You can get the SDK here: Open XML SDK 2.0 for Microsoft Office . However, I'm not sure if it will work with the 'old' excel format. I assume you'll need to save the template workbook in the new Office formats (xslx).
Your question is slightly unclear as to where the spreadsheet is stored.
If it's on a server you control, process it, extracting sheets you need, and create other sheets which are smaller in size. (Or possibly save them in a different format.).
If they're not on a server you control, download the file using C#, then go through a similiar process of extracting the sheet before opening it.
Having said that, I've dealt with some largish spreadsheets (20MB or so), and haven't really had a problem processing the entire spreadsheet as a whole.
So where is the bottleneck? Your network or possibly the machine you're running?
Use third party components.
We are fighting with server side Excel generation for years and has been defeated.
We bought third party components and all problems gone.
From your question, it seems you want to improve load time by using (opening) the data from one worksheet instead of the whole workbook. If this is the case and you only want the data, then access the workbook using ADO.NET with OLEDB provider. (You can use threading to load each worksheet to improve load performance. For instance, loading three large data sets in three worksheets took 17 seconds. Loading each worksheet on a separate thread, loaded same data sets in 5 seconds.)
From experience, performance starts to really suffer with workbooks of 40MB or more. Especially, if workbooks contain many formulas. My largest workbook of 120MB takes several minutes to load. Using OLEDB access, I can load, access, and process the same data in a few seconds.
If you want the client to open data in Excel, gather data via ADO.NET/OLEDB, get XML and transform into XMLSS using Xslt. Which is easy and there is much documentation and samples.
If you just want to present the data, gather data via ADO.NET/OLEDB, get XML and transform into HTML using Xslt. Which is easy and there is much documentation and samples.
Be aware that the browser and computer become non-responsive with large data sets. I had to set limit upper limit. If limit was reaced, I notified user of truncated results, otherwise, user thought computer was "locked".
Take a look at this question in StackOverflow:
Create Excel (.XLS and .XLSX) file from C#
I think you can open your workbook on the server (inside your ASP.NET MVC application) and process only the specific worksheet you want. You can then send such worksheet to the user using NPOI.
The following post shows you how to do that using an ASP.NET MVC application:
Creating Excel spreadsheets .XLS and .XLSX in C#
You can't "say" to Excel, even via Interop that you only want a single worksheet. There are a lot of explanations, like formulas, references and links between them, which makes the task impossible.
If you only want to read the data from the worksheet, maybe OLEDB Data Provider is the best option for you. Here is a full example: Reading excel file using OLEDB Data Provider
Otherwise, you will need to load the entire workbook in memory before do anything with it.

Categories

Resources