I'm struggling to figure out how to convert a CSV file into a database. I've tried a few methods here but I can't wrap my head around it. I have a CSV file with thousands of rows and I need to convert that into a
SQLite database using C#. Any help is appreciated!
You can leverage MS Excel, open the CSV file and specify your character separator as needed (I believe it will default to tab limited). You can now save your thousands of rows as an Excel spreadsheet versus the character separated file (CSV) format.
Then you can leverage the open source OpenXML libraries to open the Excel document and work with it using object model. In an object oriented fashion, you can programatically create your new database using SQL statements.
Query for the spreadsheet headers to be used as your column names. Of course you'll need to ensure that your source CSV had provided appropriate headers. These can easily be added to the top of the large file if not.
E.g.
https://learn.microsoft.com/en-us/office/open-xml/how-to-get-a-column-heading-in-a-spreadsheet
Next, you simply iterate the rows and construct your SQL statement to insert the rows. You can review the Open XML docs, Microsoft docs, or existing StackOverflow docs for sample code on how this is easily done.
How to read xslx with open XML SDK based on columns in each row in C#?
Related
I have multiple .xlsb templates that have 5-6 sheets in them.
First, 2-3 sheets are where I need to add data dynamically so that the 4th and 5th sheets can generate data based on the macro/formulas written in them.
Can you guys help me out with what can be used for this?
What I have already tried?
ExcelDataReader - But it did not work for xlsb consistently
LinqToExcel - I could read the data from the custom templates by creating the models and fetch the related data through Linq
I am not finding anything that I use to write/export data to xlsb to specific sheets
Can anyone help me out here as I cannot convert the template to another format, it has to be xlsb.
I am calling a web service and the data from the web service is in csv format.
If I try to save data in xls/xlsx, then I get multiple sheets in a workbook.
So, how can I save the data in csv with multipletab/sheets in c#.
I know csv with multiple tabs is not practical, but is there any damn way or any library to save data in csv with multiple tabs/sheet?
CSV, as a file format, assumes one "table" of data; in Excel terms that's one sheet of a workbook. While it's just plain text, and you can interpret it any way you want, the "standard" CSV format does not support what your supervisor is thinking.
You can fudge what you want a couple of ways:
Use a different file for each sheet, with related but distinct names, like "Book1_Sheet1", "Book1_Sheet2" etc. You can then find groups of related files by the text before the first underscore. This is the easiest to implement, but requires users to schlep around multiple files per logical "workbook", and if one gets lost in the shuffle you've lost that data.
Do the above, and also "zip" the files into a single archive you can move around. You keep the pure CSV advantage of the above option, plus the convenience of having one file to move instead of several, but the downside of having to zip/unzip the archive to get to the actual files. To ease the pain, if you're in .NET 4.5 you have access to a built-in ZipFile implementation, and if you are not you can use the open-source DotNetZip or SharpZipLib, any of which will allow you to programmatically create and consume standard Windows ZIP files. You can also use the nearly universal .tar.gz (aka .tgz) combination, but your users will need either your program or a third-party compression tool like 7Zip or WinRAR to create the archive from a set of exported CSVs.
Implement a quasi-CSV format where a blank line (containing only a newline) acts as a "tab separator", and your parser would expect a new line of column headers followed by data rows in the new configuration. This variant of standard CSV may not readable by other consumers of CSVs as it doesn't adhere to the expected file format, and as such I would recommend you don't use the ".csv" extension as it will confuse and frustrate users expecting to be able to open it in other applications like spreadsheets.
If I try to save data in xls/xlsx, then I get multiple sheets in a workbook.
Your answer is in your question, don't use text/csv (which most certainly can not do multiple sheets, it can't even do one sheet; there's no such thing as a sheet in text/csv though there is in how some applications like Excel or Calc choose to import it into a format that does have sheets) but save it as xls, xlsx, ods or another format that does have sheets.
Both XLSX and ODS are much more complicated than text/csv, but are each probably the most straightforward of their respective sets of formats.
I've been using this library for a while now,
https://github.com/SheetJS/js-xlsx
in my projects to import data and structure from formats like: xls(x), csv and xml but you can for sure save in that formats as well (all from client)!
Hope that can help you,, take a look on online demo,
http://oss.sheetjs.com/js-xlsx/
peek in source code or file an issue on GH? but I think you will have to do most coding on youre own
I think you want to reduce the size of your excel file. If yes then you can do it by saving it as xlsb i.e., Excel Binary Workbook format. Further, you can reduce your file size by deleting all the blank cells.
No, ADO.NET will not solve my problem because the excel files I'm working with do not contain information in tabular form. In other words, there is nothing to query, and the name of the sheets and number of sheets will vary.
essentially my job is to search every single cell in an excel document and validate it against some other data.
Right now all I have is a byte[] array that represents the contents of an .xls file. Converting to a string is meaningless since it's just binary data.
If I use COM interop and run Excel in the background, is it possible to inject it with binary data in byte[] array form or do I have to save the file to disk and then automate the process of opening it and scanning each row?
Isn't there an easier way to do it?
How do you read the binary data of an excel file (.xls) using .NET
There are a number of ways, the excel file format has changed a few times so reading the files natively is hard work and version dependent, it's usually not recommended. For reading tabular data most people choose ADO.NET, but as you allude, if you need any formatting or discovery then MS would recommend COM Interop.
If I use COM interop and run Excel in the background, is it possible to inject it with binary data in byte[] array form
The excel COM object model does allow you to bulk set data to a Range object you set it with a 2 dimensional object array (object[,])
or do I have to save the file to disk and then automate the process of opening it and scanning each row?
No, you can interact with the "out of process" COM server (Excel) without having to save first, you can set your data, format it etc in memory.
Isn't there an easier way to do it?
Yes there is, checkout Spreadsheet Gear their object model is nearly identical to the com model, however you do not need Excel involved at all, it is also an order of magnitude faster working with large data. Its not cheap ($1000 bucks last time I checked) but will save you way more than that in coding effort. (I am not affiliated with Spreadsheet gear in any way)
You could use NPOI to open & read your XLS files, you'll basically want to loop through your Sheets / Rows / Columns looking for data. I commonly use NPOI to read & write XLS forms that contain data in random cells throughout a worksheet.
I want to create a data structure that I can copy to the clipboard in such a way that the user can paste it into an Excel worksheet and it inserts correctly into the columns and rows of the sheet.
Is there any way to create such a data structure? Or does it already exist?
I would like to avoid having to open up Excel and pasting the values myself because I can't determine if the worksheet will look the same in the future, so I'd rather have the user himself copy the rows and columns where he wants them.
When copying the data to clipboard, format it as Tab separated for columns, and Enter separated for rows. When pasting in Excel it will automatically put the values in rows and columns.
You can copy your data to clipboard in a tab-delimited textual format.
A tab or comma delimited string is the easiest and least technical solution.
Assuming you want something a little more complicated there are some superb libraries around (search CodePlex) which can offer creating Excel documents in managed code.
Or you could use the interop libraries that come ad part of the Visual Studio office integration.
Or you could use the XLSX format based on XML.
I have saved the specifications of a mobile phone into an Excel spreadsheet. Now I want to copy the data from this Excel sheet into a database programmatically.
How can I do this? Is there any way? I copied the phone specs from gsmarena.com. Please help me.
It's completely possible. There are several ways, not all of which require programming.
If the Excel sheet is in a tabular format (one "record" per row, with static column names), you may be able to use your database management tool (MS Sql Management Studio for Sql Server, for instance) to do a "bulk insert". Consult the documentation for your particular database.
You can also use Excel formulas to create a column in the Excel sheet that contains "Insert" statements that will insert each row into the DB. You simply copy that column out of the sheet and paste it into a SQL command parser, maybe wrap it in a transaction, and then hit "execute". I've done this a few times when doing data migration; it's kind of a one-off, but for a one-time operation that's just fine.
If you want to use a program to solve the problem, you'll need to use OLE automation to open your Excel sheet, and programatically iterate over the rows and columns to create a "record" that you save to the database. The exact wheres and hows are a little in-depth, and depend a lot on the type of database, your version of Excel, the type of Excel document you have (XLS or XLSX), and how your Excel sheet is organized.
Well for starters, is the database properly designed and ready to go? If not, you need to first design one, which I suggest you base on normalizing the Excel data. Once that's complete, you can use the interop libraries for .NET to pull the data from Excel and write to the DB through MySQL, Access or some other DBMS.
You could use a tool such as SSIS, or connect via ADO.Net and OleDb/Jet drivers.