I have processes running on Windows XP/7. They generate weekly .csv data files. I have a bunch of excel formulas that crunch the numbers for each .csv file produced for the week separately and then when adding the weekly data to the one big spreadsheet containing all the data put together.
The number of rows varies each week and for each process. So I can't hardcode that number in my dozens of formulas. So right now I go through this stupid process of manually entering the formulas each week into the .csv files.
There's got to be a way of automating this. Just now I quickly looked into doing this through C# or VB code. Could somebody recommend the best way to do this. Is C# or VB the right way to go? If so, any hints on how to put it all together - what's the model to use? For example, would it look something like this:
C# module reads in .csv data file
C# module creates an Excel spreadsheet and populates it with the .csv data
C# module runs my formulas on the all the rows.
Is that how one would approach it? Is there a better way for somebody who has very limited knowledge of C# or VB? I know Java and C++.
Any advice would be highly appreciated.
Thanks
From your explanations in comments, it appears that having a series of template Excel sheets would greatly facilitate the task.
So, for each process that generates data, you say the formulas are always the same, meaning that the columns are always the same (am I right?).
So, even if you don't know how many rows of data, you can still either create a template where only the first row is filled with formulas, and then you simply copy that row over and over, filling it with data as needed, or, you could fill a relatively "comfortable" number of rows with those same formulas, and fill in the data.
There are tons of atricles on how to Interop with Excel, so it's beyond my intent to provide you with specific code, but the idea is good.
If I can allow myself, I have worked in the past with a very interesting tool call Flexcel Studio for .NET, and I have found it to be of great help when it came to generating Excel sheets based on such templates.
Cheers
As others have suggested, I would recommend performing the calculations outside of excel if possible. There are plenty of stats libraries out there that are friendlier to work with than going through the hassle of moving data into excel, applying formulas to cell ranges, and so on.
If you really want to go the excel route, you can either use open-source libraries such as EPPLUS (.NET) or POI (Java) to work with .XLSX files directly. Some libraries do not support function evaluation so you will need to consider this when deciding on a library to use.
If you go with COM interop, you should read about about the following: Considerations for server-side Automation of Office.
As for the C# or VB (if not java with POI), I would go with C#. C# syntax is similar to java.
There might be a really simple solution to this problem.
Add 1 piece of auxiliary data to the .csv file either programmatically when running my process or when creating the .xlsx file (with all the formulas) from the .csv file. The auxiliary piece of data is the row count which will be in some known location.
Then modify all my formulas to use the INDIRECT function to specify the range using the cell
with the auxiliary piece of data.
I think that might work.
Related
I have different excel spreadsheets that contain tables with the same structure and the same header.
I need to conserve the header in the first spreadsheet and copy the tables of the other spreadsheets one beside the other in the first one, without repeating the header every time.
I have 150 or more spreadsheets, it takes a long time to do it by hand.
How can I solve this problem with a programming language like Python, C# or excel VBA?
I saw similar questions but I didn't manage to solve this problem with the answers given to those questions.
Question 1
Question 2
I would probably automate the process using whichever programming language you are most comfortable with that also has a pre-made Excel package. For instance, in C#, I used to use EPPlus.
Essentially you would write code that opens the existing first spreadsheet for writing, opens the other spreadsheets one by one and reads the table content starting from row 2, and copies the data into your master one. Rinse and repeat.
I would even clone the original spreadsheet and rename it to something else while you are testing this so that you don't accidentally clobber anything.
Is there any tool out there which will format excel formulas in such a way that they are more easily decipherable?
I need to convert some complex excel spreadsheets people have made to C# applications and sitting there looking at one line excel formulas is relatively troublesome. Primarily I'm looking for something that can rewrite them to pseudocode or a more readable programming language.
The closest thing I could find was http://ewbi.blogs.com/develops/2004/12/excel_formula_p.html but this still does not help all that much.
Sean Cheshire's comment is probably the best answer you're going to get:
Excel Formula Beautifier
It has a feature that allows you to convert to javascript.
The one thing it seems to be missing is a feature to do a batch of formulas at once (which would save a lot of time if converting a whole sheet with "Show Formulas" turned on). I added a request for this.
Given how the app works now, if you try pasting a batch into the single-line input field, you may still get a result that's at least somewhat useful. I recommend giving it a try.
I need to read XLSX files and extract a maximum amount of content from it. Which of the API's should I use?
OLE DB, open XML SDK, or Excel Interop?
Which is the easiest to use?
Can you retrieve all the information using one or the other? i.e, date, times, merged cells, tables, pivottables, etc.
You can try all of them and choose the one that fits you most...
Depending on data you want to read, I'd suggest you to use Open XML over Interop or Ole DB.
I don't know an open XML SDK, although I've some experience with EPPlus library which I'm using a lot and can say only good words about it - it's fast, easy to learn, with good examples. The library is basing on Open Office XML format, so I suppose it's pretty much the same as the SDK you've mentioned, and is capable of easy read and write Excel 2007 and 2010 files.
On the linked web, you'll find a library itself, documentation and some example "Hello World" projects to download.
Why that library in the first place? Because with it you will be able to read not only cells values, but also their colors, fonts, widths and heights, merging and all that detailed stuff, that you can not only read, but modify as well. What's more, you don't need the Excel installed to do that.
On the second place - just in case you need to extract tabular data from worksheet - you may play with OLE DB. I'm afraid with that you won't be able to extract any info about formats, colors etc., as well as the data must be in a tabular organized worksheet, so you can treat is as a database's table.
The last one is Interop, because:
- it's a COM library, so you need to be very careful when playing with it via .NET, as it's easy to cause some ugly and hard to find memory leaks (confirmed by myself bad experience) - if you don't dispose their objects properly, it leaves the Excel.exe process opened,
- it's much slower than previous methods,
- basically, it has almost no more added value that one of the previous methods (EPPlus or OleDB) and requires Excel to be installed on client's machine, so why to use it?
Good luck, then.
I have an excel spread sheet (well, hundreds of them) which I need importing into a database.
If the excel data was in a nice uniform format I would simply save them out to CSV, read them in using something like LINQ to CSV and save the required data away.
However, the excel spread sheet is 'uneven' in that different groups of cells contain different data.
I need a way of grabbing the data and then working with cell references to grab the bits I need and save them to the database.
What's the best way to achieve this?
Thanks
UPDATE some more information
I have numerous spread sheets, all identical in structure that need to be imported into a database. The import is not simple in that different chunks of data from the spread sheet will go into different tables. The excel document itself contains a few sections (basically question / answer) type data. For each different section I need to grab the data, shape it into a form that makes sense in terms of the database and save it.
Ideally I would like to create a quick little WPF app that will let me select a spread sheet hit a button and perform the import.
You could use the Excel Object Model to read the data if you do it in a non web environment.
See for example How to automate Microsoft Excel from Microsoft Visual C#.NET.
If it has to be inside a web application. I suggest to use Aspose Cells.
Turn the Excel Spread sheet into an ODBC (Open Database Connectivity) Data Source so you can access it just like you would any database:
http://www.datamystic.com/datapipe/excel_odbc.html
Then access it just like any database using ODBC:
http://msdn.microsoft.com/en-us/library/system.data.odbc.odbcconnection(v=vs.71).aspx
When the data is not uniform, it is often better to keep your approach as simple as possible in the first instance. Start with vba and the "Range" object (which is part of the excel object heirarchy). From there you can increase the level of automation and in most instances reuse this "Range" work.
avariable = Range("A2:A5")
That notiation is not going to change very much. It wont matter what final target language you use (language: C# / vba / etc).
There are a number of other ways of going about this -- java based / xml based / c# based / and a few other really cool ones that only apply to certain niche situations. If you can provide more information about your use case, then perhaps I can suggest some more things to try.
Q & A
example link for automation from C#: http://support.microsoft.com/kb/302084
You should probably take a look at Microsoft's Visual Studio Tools For Office (VSTO), which handles a lot of the unpleasant COM/interop stuff for you.
To those who may be interested I ended up using LinqToExcel:
http://code.google.com/p/linqtoexcel/
Did exactly what I was after with minimal fuss. Excellent
I have a spreadsheet that I'd like to compile into a form that I could call from C#.
Naturally, I'd like to be able to change the inputs to the spreadsheet before reading the calculated result.
What is your recommended method?
UPDATE:
To clarify, I want to make an existing Excel spreadsheet available as a web service that is callable from .NET. I can't have a dependency on Excel, as its running on a web server.
UPDATE:
I used the answer below, and it worked like a charm. Now I can prototype a formula in Excel, then convert it straight into C# and compile it into an assembly.
This question is also covered under Reading Excel Files as a Server Process.
FlexCel API Mate within TMS Flexcel Studio for .NET lets you convert an existing Excel spreadsheet into C# code, recalculate the spreadsheet, and read the result out of a cell using an API call.
See the video tutorial of FlexCel ApiMate. The video states, quote:
ApiMate will convert an Excel file
into a C#, VB.NET or Delphi.NET
program.
The docs also state:
Recalculation of more than 200 Excel
functions.
and:
You can add your own functions on the
code to the already big list
implemented by FlexCel, and use them
as native functions in your report.
UPDATE
Here is clarification from TMS tech support:
Emailed question:
I'd like to do the following:
Convert an existing .xlsx file to C# code, importing data from a database.
Allow FlexCel to recalculate the spreadsheet for me.
Read an answer out of a cell (for use elsewhere in my C# code).
Skip the step of writing the finished .xlsx file to the disk (we don't need this).
In short, I want to use FlexCel as an "Webserver Excel calculation engine", so we don't have to have Excel installed on the web server to perform spreadsheet calculations.
Are the steps I've described possible? Or have I misunderstood how the component works?
Emailed reply:
You can either load the file directly from the database (by opening from a stream) or use the APIMate tool (incuded in the tools folder) to convert the file to c# code.
Yes, FlexCel will recalculate it with XlsFile.Recalc()
Yes, you can read the recalculated values too.
Yes, you don't need to write the answer if you don't want to.
Besides this, for using it as recalculation engine, we have the "RecalculateCell()" method that won't recalculate the full spreadsheet, but only the cells needed to get the value in an specific cell. So, if for example your result is in A1, you can call RecalcCell in A1, and it will recalculate only all cells needed to get the value in A1 (including dependecies, so if A1 has a formula with a2, and a2 with a3, all 3 will be calculated).
There is also a RecalcExpression method, that will recalculate the value of any formula without needing to write it into a cell. So imagine you have a column of numbers at col A, and you want to know the sum. You could use RecalcExpression("=sum(A:A)"); to know the sum, without needing to enter a formula in B1 with the sum and then reading the value of that formula (which you could also do of course)
From Microsoft, there appears to be a framework called Excel Services "Develop A Calculation Engine For Your Apps"
Teaser excerpt:
This article discusses:
Excel as a server-based application
The Excel Services architecture and APIs
Creating managed user-defined functions
Building custom solutions with Excel Services
I have never used it, but the info-graphics on the main page are most encouraging.
Thanks for asking this :)
Calc4web converts spreadsheets into C++ code, which can be called from C#, Java, etc.
Quote from website:
Calc4Web gives programmers a better
way to get their job done: create a
small spreadsheet which holds the
logic, and push a button to turn that
spreadsheet into C++ code that works
on the first try, code that can be
called from any language: C,C++, C#,
Visual Basic, Java, and any other
language which can call into DLLs.
I also suspect that since it compiles the spreadsheet logic to purely native C++ code, it will be very fast compared to Excel (the website states "5,000 times faster").
Check out ActiveMesa X2C, a tool for converting Excel spreadsheets into C# and C++. (Disclaimer: I'm the author.)
For a list of Spreadsheet components that allow you to work with Excel spreadsheets without having a dependency on Excel, see DevDirect Spreadsheet Components.
SyncFusion Essential Calculate.
Quote:
... you can fully load,
manipulate, and compute Excel
spreadsheets with no dependence on
Excel.
There is no way to compile an excel spreadsheet into c# code.They are not in any way the same "thing", In the same way you can't compile this text that you are reading, or a word document into code or an assembly of some kind.
You have to be more specific with the kind of functionality you want to get, which will help you get an answer.
There are several ways (pointed out in other answers here) to allow you to access Excel spreadsheet data and utilize them, but this is in no way compiling them.