Ways of searching an excel file in asp.net - c#

Consider i have an excel file >200000 rows in it. What is the fastest way that can be implemented to search a partcular column value in this file using c# asp.net. Any suggestion.

Assuming 1) you can cache the file contents fine (not too large, file doesn't change, etc) and 2) you don't already have a mechanism for reading the file, I would just read the file once (at application start, or lazy load on demand, or whatever) into memory - I have used and really like the FileHelpers libs from http://www.filehelpers.com/ - see their excel example # http://www.filehelpers.com/example_exceldatalink.html
as part of the 'read in the file', you'd likely also create some indexes for the later queries. If you only cared about the one column, you could just push it all into a HashSet, for instance, so you can do a Contains later quickly.

You cannot access an Excel file from ASP.NET at all if you are using the Excel Automation APIs. These were written for use in a desktop application, not in a server application like ASP.NET. They will not work, are not supported, and may very well violate your license agreement with Microsoft.
There are third-party libraries that can access an Excel file safely from ASP.NET. These do not use the Automation APIs.

You may want to consider using an "OLE DB for Jet 4.0" connection, which you can query via ADO.NET. OLE DB access to Excel is provided via the MDAC component, which comes standard on versions of Windows after 2000. ConnectionStrings.com has OLE DB connection strings for connecting to Excel, as well as information on using Jet in a 64-bit environment.

Use EPPLus and read the file into an DataTable.
Could take some time, that file is a little bit big...

Related

What is the standard way for dealing with PowerPoint (.PPTX) files on the server?

I've been tasked with a feature that can generate PowerPoint files on the server using C#. I'd basically start with a template, and programmatically replace some text with live data from the database. I've been doing some research into this area for the past day and here's what I've found:
PowerPoint has this sort of thing built in, meaning it can connect to external data sources and pull in data. Most examples of this, I've found, either use PowerPoint automation done on the server (I've been advised against this) or assume a SQL Server backend. Our company uses Oracle for our RDMS needs. Oracle has a solution for this called Oracle BI, but it requires a whole new web server setup to run various Java EE components and what not. I didn't look at the price, but knowing Oracle it's not cheap. It also requires new software to be installed on the end user's machine, which we really want to avoid.
Generating PowerPoint files on the fly is possible. The company that is basically the go-to guys for this problem (every help forum points to them, and they get all the rave reviews) is Aspose. They have .NET components for dealing with just about any Office format you can think of. The problem is, they are astronomically expensive. Just the PowerPoint component (a site license for up to 10 developers) would cost $3,995.
The third possibility is generating a solution in-house. After all, a PPTX file is just xml, right? Well, looking closer, a PPTX appears to be a gzip archive. It contains many folders, each containing many XML files. Modifying a PPTX file would, correct me if I'm wrong, entail unzipping the file to a temporary directory, reading the XML file and modifying the contents, then packaging up everything again and write the file out to the response stream. Perhaps there are libraries that can work with gzip streams on the fly without extracting everything.
My Question: Are there easier ways to work with a PPTX file using .NET that don't require working with compressed XML files or buying very expensive software? Basically, we need to modify a PowerPoint file, change some text, and allow the user to download that generated file from a web server.
OpenXML is Microsoft's .Net library that lets you manipulate Office documents. It lets you open a PPTX file and provides an object model that wraps the XML contents.
Here's the link to the OpenXML SDK and the MSDN documentation.
I've used OpenXML to let a ASP.Net page dynamically generate Word documents from a database.
Don't use Office Interop on a web server. It's an all-around bad idea.
If you are only replacing text placeholders for files that will not change, the home grown solution that finds the placeholders in the xml files in the gzip archive should be doable. .Net has had zip support for some time, and it is greatly improved if you are able to use .Net 4.5, so you shouldn't need to extract the archive to a temporary location at all.
PowerPoint should also support connecting directly to Oracle in the same way it supports connecting to Sql Server (just play around with the connection options), without needing the special Oracle BI stuff. However, I'd still prefer the home-grown solution, as this will only work while the powerpoint file is able to reach your database directly, which is typically only possible in your local LAN environment or with an active VPN.
If you want anything fancier than a simple text replacement, perhaps looks for an Aspose competitor.

OLE DB vs OPEN XML SDK vs Excel.interop

I need to read XLSX files and extract a maximum amount of content from it. Which of the API's should I use?
OLE DB, open XML SDK, or Excel Interop?
Which is the easiest to use?
Can you retrieve all the information using one or the other? i.e, date, times, merged cells, tables, pivottables, etc.
You can try all of them and choose the one that fits you most...
Depending on data you want to read, I'd suggest you to use Open XML over Interop or Ole DB.
I don't know an open XML SDK, although I've some experience with EPPlus library which I'm using a lot and can say only good words about it - it's fast, easy to learn, with good examples. The library is basing on Open Office XML format, so I suppose it's pretty much the same as the SDK you've mentioned, and is capable of easy read and write Excel 2007 and 2010 files.
On the linked web, you'll find a library itself, documentation and some example "Hello World" projects to download.
Why that library in the first place? Because with it you will be able to read not only cells values, but also their colors, fonts, widths and heights, merging and all that detailed stuff, that you can not only read, but modify as well. What's more, you don't need the Excel installed to do that.
On the second place - just in case you need to extract tabular data from worksheet - you may play with OLE DB. I'm afraid with that you won't be able to extract any info about formats, colors etc., as well as the data must be in a tabular organized worksheet, so you can treat is as a database's table.
The last one is Interop, because:
- it's a COM library, so you need to be very careful when playing with it via .NET, as it's easy to cause some ugly and hard to find memory leaks (confirmed by myself bad experience) - if you don't dispose their objects properly, it leaves the Excel.exe process opened,
- it's much slower than previous methods,
- basically, it has almost no more added value that one of the previous methods (EPPlus or OleDB) and requires Excel to be installed on client's machine, so why to use it?
Good luck, then.

Do I need to have Excel installed to query spreadsheets using an OleDB Provider?

I am working on some software that uses an OleDB to open a .xls file, query some data, and fill a dataset with the results. I am now looking at using this software on systems that do not have Excel. Will my software still be able to read the spreadsheets?
This xls file contains 1000s of configuration settings that my software uses. If this setup won't work on computers without Excel, I'm guessing my next best move is to convert the xls file to an XML file and read it in using XML Services.
You don't have to have office or the office data connectivity installed, you can use the Jet for OleDB engine which is installed on pretty much every windows machine in existence. However it's very old technology and is limited to 32 bit.
http://msdn.microsoft.com/en-us/library/ms175866.aspx
If you want to avoid this mess entirely then switch over to an app.config file or a properties file and you can get the pure .net solution.
You need either the full Office or the Office Data Connectivity Components installed on the client computers.

How do I turn a flat file of data into a queryable data source

I generate files, lets call them .dwrf files, which contain a significant amount of data. Currently we export those to .CSV and the resulting files are large (2GB+). I would like to cut out the export process and make the contents of a .dwrf file queryable directly from Excel or other applications.
What I would like to do is write a utility/service - lets call it dwrfMiner - to extract data from the file and pass it on as a datasource and link dwrfMiner to .dwrf files in some way so that Excel recognises it as an external data source.
Any ideas?
While writing an ODBC driver for this is probably overkill, if the format of the files you are working with is known in advance and isn't too hard to translate (it sounds like not considering you are already creating CSVs) then using an ODBC DSN sounds like your best bet.
There are a nice selection of ODBC drivers already built in to Windows (.txt, .csv, .mdb, .xl*, .dbf, Paradox .db, etc etc) and you can obtain other drivers from the web for a lot of common formats.
If the size of the existing format you're exporting to is too onerous (CSV) then the logical point to start is a transformation of your data to something more space-conscious that has ODBC support.
Failing that, your last option is the overkill option (Writing an ODBC driver).
Excel can query external data souces, but beware that Excel (all versions) have hard-limits on the number of rows they can display, per work-book. I think in Excel 2003 the limit is ~65k. It's higher in other versions.
See my question: reporting tool/viewer for large datasets (and I had much less than > 2GB).
I used PHP FlatFile DB to query flat-files in the past
I'd get out gcc and write yourself a full ODBC driver for it. Then you can sit back and use SQL.
You know, if you're bored. ;)
use odbc driver with multithreading

How can I programmatically create, read, write an excel without having office installed?

I'm confused as hell with all the bazillion ways to read/write/create excel files. VSTO, OLEDB, etc, but they all seem to have the requirement that office must be installed.
Here is my situation: I need to develop an app which will take an excel file as input, do some calculations and create a new excel file which will basically be a modification of the first excel file. All with the constraint that the machine that runs this may not have office installed. (Don't ask why...)
I need to support all excel formats. The only saving grace is that the formats spreadsheets themselves are really simple. Just a bunch of columns and values, nothing fancy. And unfortunately no CSV as the end user might not even know what a CSV file is.
write your excel in HTML table format:
<html>
<body>
<table>
<tr>
<td style="background-color:#acc3ff">Cell1</td>
<td style="font-weight:bold">Cell2</td>
</tr>
</table>
</body>
</html>
and give your file an xls extension. Excel will convert it automatically
Without Office installed you'll need something designed to understand the Excel binary file format (unless you only want to open Office 2007 .xlsx files).
The best I've found (and that I use) is SpreadsheetGear, which in addition to being .NET native, is much faster and more stable then the COM/OLE solutions (which I've used in the past)
read and write csv files instead. Excel reads them just fine and they're easier to use. If you need to work against .xls files then try having support for OpenOffice as well as Excel. OpenOffice can read and write excel files.
Did you consider way number bazillion and one: using the Open XML SDK? You can retain styles and tweak it to your liking. Anything you can do in an actual file is possible to achieve programatically. The SDK comes with a tool called Document Reflector that shows the underlying XML and even shows LINQ statements that can be used to generate them. That is key to playing around with it, seeing how the changes are made, then recreating that in code.
The only caveat is this will work for the new XML based formats (*.xlsx) not the older versions. There's a slight learning curve but more material is making its way on blogs and other sites.
If cost is not an issue, I'd suggest looking in Aspose's Excel product. I use their Word product and I've been satisfied.
Aspose.Cells
Excel XLSX files "just" XML files - more precisely ZIP files containing several XML files. Just rename a Excel file Test.xslx to Test.zip and open it with your favourit ZIP program. XML schemas are, afaik, standardized and availiable. But I think it might not be that easy to manipulate them only using primitive XML processiing tools and frameworks.
Excel files are in a proprietary format so (afaik) you're not going to be able to do this without having the office interop available. Some third party tools exist (which presumably licence the format from MS?) but I've not used them myself to comment on their usefulness.
I assume that you can't control the base file format, i.e. simple CSV or XML formats aren't going to be possible?
I used to use a very nice library called CarlosAg, which uses Excel XML format. It was great (and Excel recognizes the format), and also incredibly fast. Check it out here.
Oh, as a side note, we used to use this for the very same reason you need it. The servers that generated these files were not able to have Excel installed.
If you cannot work with CSV files as per #RHicke's suggestion, and assuming you are working on a web app, since a desktop app would be guaranteed to have XL installed as per requirements.
I'd say, create your processing app as a webservice, and build an XL addin which will interact with your webservice directly from XL.
For XLSX files, look at using http://www.codeplex.com/ExcelPackage. Otherwise, some paid 3rd party solutions are out there, like the one David suggested.
I can understand the requirement of not having office installed on a server machine.
There are many libraries like aspose being available, some of them requiring license though.
If you are targeting MS Excel formats, then a native, Interoperability library, ACE OLEDB data provider, from Microsoft is available which you can install on a machine and start reading, writing programmatically. You need to define a connection string and commands as per you needs. (Ref: This article #yoursandmyideas)talks about using this library along with setup and troubleshooting information.

Categories

Resources