I'm beginning to program in C#. I can use Visual Studio Express 2012 for this purpose. I'm trying to create application that will import data from xml spreadsheet 2003 from specific column (but not specified number of entries in that column) and it will list text from each cell (all of them in that column).
I have read few topics about it, like this one:
http://social.msdn.microsoft.com/Forums/windowsapps/en-US/4fce4765-2d05-4a2b-8d0a-6219e87f3307/reading-excel-file-using-c-in-winrt-platform?forum=winappswithcsharp
but most of the answers are related with Visual Studio 2012 not the express version, thus I'm limited with libraries and extensions. Most of this solutions when I try to use them, don't work in my VS Express 2012 cause they are missing something.
This program is working for me and is returning value of one specific cell. How can I change it, so it will read every cell from that column, assign every value to a table (or maybe variable) so I can work with this content and maybe randomize order later?
namespace UnitTest
{
public class TestCode
{
//ReadExcelCellTest
public static void Main()
{
XDocument document = XDocument.Load(#"C:\Projekt2\File1.xml");
XNamespace workbookNameSpace = #"urn:schemas-microsoft-com:office:spreadsheet";
// Get worksheet
var query = from w in document.Elements(workbookNameSpace + "Workbook").Elements(workbookNameSpace + "Worksheet")
where w.Attribute(workbookNameSpace + "Name").Value.Equals("Sheet1")
select w;
List<XElement> foundWoksheets = query.ToList<XElement>();
if (foundWoksheets.Count() <= 0) { throw new ApplicationException("Worksheet Settings could not be found"); }
XElement worksheet = query.ToList<XElement>()[0];
// Get the row for "Seat"
query = from d in worksheet.Elements(workbookNameSpace + "Table").Elements(workbookNameSpace + "Row").Elements(workbookNameSpace + "Cell").Elements(workbookNameSpace + "Data")
where d.Value.Equals("StateID")
select d;
List<XElement> foundData = query.ToList<XElement>();
if (foundData.Count() <= 0) { throw new ApplicationException("Row 'StateID' could not be found"); }
XElement row = query.ToList<XElement>()[0].Parent.Parent;
// Get value cell of Etl_SPIImportLocation_ImportPath setting
XElement cell = row.Elements().ToList<XElement>()[1];
// Get the value "Leon"
string cellValue = cell.Elements(workbookNameSpace + "Data").ToList<XElement>()[0].Value;
Console.WriteLine(cellValue);
}
}
}
I believe that any version of Visual Studio would allow you to use open-source libraries:
http://closedxml.codeplex.com/
Link to doc: http://closedxml.codeplex.com/wikipage?title=Finding%20and%20extracting%20the%20data&referringTitle=Documentation
Reading Excel cell value should be much easier then.
How can I change it, so it will read every cell from that column,
assign every value to a table (or maybe variable) so I can work with
this content and maybe randomize order later?
I hope that closedxml will help you achieve this task.
Handling Excel sheets is not as easy as one might expect. For example: cell content is often just a reference to the real value stored in a dictionary (that's what the List<Dictionary<string, string>> is for in the code in the forum topic you linked). Also, other whacky things can happen, like receiving unexpected NULL cells at the end of a row.
I don't know how low level you want to keep your code. If there's a possibility that the functionality will evolve, you better look for some libraries. Take a look at Microsoft's own Open XML SDK: Open XML SDK 2.5. That provides support for all XML based office formats. There are two downsides: it's 12MB+ assembly, and the other is that this one is still not as high level as I expected. But you get some concepts like rows and columns.
The other alternatives are some non Microsoft libraries. There are numerous ones on CodePlex and other open source repos. Watch out to select something which is active and updated. Take a look at the issue section of the project, you'll see that there are usually many. You can see how they are handled. Many projects focus on Excel only, which is probably what you need, and will be smaller than a general OpenXML solution.
You can get paid product. Finally I ended up using SmartXLS, because it turned out our company had a license.
Related
I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutorclass that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.
Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.
To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.
Source: What is IMEX in the OLEDB connection string?
I am struggling with the Open XML SDK and I've already read a lot of posts on this topic but cannot figure it out. My goal is to have a locally created Excel file which contains a formula and edit the input online and retrieve the calculated value online.
I don't know if this is possible since Open XML may only change the data and I wonder if it is also able to perform Excels calculations.
For example, my local file contains three cells:
A1: 1
A2: 2
A3: =(A1+A2)
Using Open XML I adjust A2 to the value of 3, however the result of A3 remains 3 instead of 4.
I have already read about Excel having to recalculate, but my goal is to have an Excel file as some sort of calculation engine instead of transfering all calculations to C#.
All tips and advice are welcome.
Kind regards, Patrick
First of all thanks for all responses.
Second I guess the response answered my question and the open XML SDK is only able to adjust the file and won't do anything regarding recaculating existing formulas in the file. This will only occur when opened in Excel. I will take a look at EPPlus.
You can use something like this.
Cell cell; //supposing this is your cell referencing A3
CellFormula cellformula = new CellFormula();
cellformula.Text = "SUM(A1, A2)";
CellValue cellValue = new CellValue();
cellValue.Text = "0";
cell.Append(cellformula);
cell.Append(cellValue);
A similar example can be found at this link: Formula cells in excel using openXML
I feel there could be some ambiguity captured in the question. OpenXML is way of storing documents, therefore it is not possible to do calculations with OpenXML SDK. It is spread sheet engine (Excel application) which performs the calculations.
When inputs are updated, spreadsheet saved, calculated values should get updated.
I've got a MS Word project where I'm building a number of Panes for users to complete some info which automatically populates text at bookmarks throughout the document. I'm just trying to find the best way of saving these values somehow that I can retrieve them easily when re-opening the document after users have typed in their values.
I could just try to retrieve them from the bookmarks themselves but of course in many cases they contain text values when I'd ideally want to store a primary key somewhere that's not visible to the user and just in case they made changes to the text which would make reverse engineering the values impossible.
I can't seem to find any information on saving custom attributes in a Word document, so would really appreciate some general guidance of how this might be achieved.
Thanks a lot!
I would suggest the use of custom document properties. there you can strings in a key -value manner (at least if it is similar to excel).
I found a thread which explains how to do it:
Set custom document properties with Word interop
After playing around with this a fair bit this is my final code in case it helps someone else, I've found this format easier to understand and work with. It's all based on the referenced article by Christian:
using Office = Microsoft.Office.Core;
using Word = Microsoft.Office.Interop.Word;
using System.Reflection;
Office.DocumentProperties properties = (Office.DocumentProperties)Globals.ThisDocument.CustomDocumentProperties;
//Check if the property exists already
if (properties.Cast<Office.DocumentProperty>().Where(c => c.Name == "nameofproperty").Count() == 0)
{
//Then add the property and value
properties.Add("nameofproperty", false, Office.MsoDocProperties.msoPropertyTypeString, "yourvalue");
}
else
{
//else just update the value
properties["nameofproperty"].Value = "yourvalue";
}
In terms of retrieving the value it's as easy as using the same three lines at the top to get the properties object, perhaps using the code in the if statement to check if it exists, and the retrieving it using properties["nameofproperty"].Value
I have an excel sheet and I am processing it by using Open XML SDK 2.0. The scenario is, There is a column which contains date in my excel sheet. I need to get the maximum and minimum date from that column.
I can do this by looping and reaching to that cell, doing comparisons and finding desired answer.
But due to optimality I want to do this by using LINQ to get Minimum and maximum dates.
Is it possible to do so? If yes, then how?
You can see how to get IEnumerable of all cells from column there:Read excel sheet data in columns using OpenXML, and use Max() on it.
Thanks to all
I have used like this
IEnumerable<Cell> cells = workSheetPart.Worksheet.Descendants<Cell>().Where(c => string.Compare(GetColumnName(c.CellReference.Value), strIndex, false) == 0).OrderBy(c => c.CellValue.Text);
And getting min and max values like this
int cellCount = cells.Count();
Cell MaxCell = cells.ToArray()[0];
Cell MinCell = cells.ToArray()[cellCount - 1];
You will want to take a look at the LINQ Min() and Max() functions. If you need to return the entire Cell object, you can use OrderByDescending().
You could read this post Open XML SDK and LINQ to XML by Eric White (Eric wrote a lot of posts about OpenXML and LINQ). And then you will be able to query your Excel file data with LINQ. You might want to see the spreadsheet objects structure in The Open XML SDK Productivity Tool, which can generate the source code for you. You could use this code to understand how to programmatically access the data you need.
How do I import data in Excel from a CSV file using C#? Actually, what I want to achieve is similar to what we do in Excel, you go to the Data tab and then select From Text option and then use the Text to columns option and select CSV and it does the magic, and all that stuff. I want to automate it.
If you could head me in the right direction, I'll really appreciate that.
EDIT: I guess I didn't explained well. What I want to do is something like
Excel.Application excelApp;
Excel.Workbook excelWorkbook;
// open excel
excelApp = new Excel.Application();
// something like
excelWorkbook.ImportFromTextFile(); // is what I need
I want to import that data into Excel, not my own application. As far as I know, I don't think I would have to parse the CSV myself and then insert them in Excel. Excel does that for us. I simply need to know how to automate that process.
I think you're over complicating things. Excel automatically splits data into columns by comma delimiters if it's a CSV file. So all you should need to do is ensure your extension is CSV.
I just tried opening a file quick in Excel and it works fine. So what you really need is just to call Workbook.Open() with a file with a CSV extension.
You could open Excel, start recording a macro, do what you want, then see what the macro recorded. That should tell you what objects to use and how to use them.
I beleive there are two parts, one is the split operation for the csv that the other responder has already picked up on, which I don't think is essential but I'll include anyways. And the big one is the writing to the excel file, which I was able to get working, but under specific circumstances and it was a pain to accomplish.
CSV is pretty simple, you can do a string.split on a comma seperator if you want. However, this method is horribly broken, albeit I'll admit I've used it myself, mainly because I also have control over the source data, and know that no quotes or escape characters will ever appear. I've included a link to an article on proper csv parsing, however, I have never tested the source or fully audited the code myself. I have used other code by the same author with success. http://www.boyet.com/articles/csvparser.html
The second part is alot more complex, and was a huge pain for me. The approach I took was to use the jet driver to treat the excel file like a database, and then run SQL queries against it. There are a few limitations, which may cause this to not fit you're goal. I was looking to use prebuilt excel file templates to basically display data and some preset functions and graphs. To accomplish this I have several tabs of report data, and one tab which is raw_data. My program writes to the raw_data tab, and all the other tabs calculations point to cells in this table. I'll go into some of the reasoning for this behavior after the code:
First off, the imports (not all may be required, this is pulled from a larger class file and I didn't properly comment what was for what):
using System.IO;
using System.Diagnostics;
using System.Data.Common;
using System.Globalization;
Next we need to define the connection string, my class already has a FileInfo reference at this point to the file I want to use, so that's what I pass on. It's possible to search on google what all the parameters are for, but basicaly use the Jet Driver (should be available on ANY windows install) to open an excel file like you're referring to a database.
string connectString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={filename};Extended Properties=""Excel 8.0;HDR=YES;IMEX=0""";
connectString = connectString.Replace("{filename}", fi.FullName);
Now let's open up the connection to the DB, and be ready to run commands on the DB:
DbProviderFactory factory = DbProviderFactories.GetFactory("System.Data.OleDb");
using (DbConnection connection = factory.CreateConnection())
{
connection.ConnectionString = connectString;
using (DbCommand command = connection.CreateCommand())
{
connection.Open();
Next we need the actual logic for DB insertion. So basically throw queries into a loop or whatever you're logic is, and insert the data row-by-row.
string query = "INSERT INTO [raw_aaa$] (correlationid, ipaddr, somenum) VALUES (\"abcdef", \"1.1.1.1", 10)";
command.CommandText = query;
command.ExecuteNonQuery();
Now here's the really annoying part, the excel driver tries to detect you're column type before insert, so even if you pass a proper integer value, if excel thinks the column type is text, it will insert all you're numbers as text, and it's very hard to get this treated like a number. As such, excel must already have the column type as the number. In order to accomplish this, for my template file I fill in the first 10 rows with dummy data, so that when you load the file in the jet driver, it can detect the proper types and use them. Then all my forumals that point at my csv table will operate properly since the values are of the right type. This may work for you if you're goals are similar to mine, and to use templates that already point to this data (just start at row 10 instead of row 2).
Because of this, my raw_aaa tab in excel might look something like this:
correlationid ipaddr somenum
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
Note row 1 is the column names that I referenced in my sql queries. I think you can do without this, but that will require a little more research. By already having this data in the excel file, the somenum column will be detected as a number, and any data inserted will be properly treated as such.
Antoher note that makes this annoying, the Jet Driver is 32-bit only, so in my case where I had an explicit 64-bit program, I was unable to execute this directly. So I had the nasty hack of writing to a file, then launch a program that would insert the data in the file into my excel template.
All in all, I think the solution is pretty nasty, but thus far haven't found a better way to do this unfortunatly. Good luck!
You can take a look at TakeIo.Spreadsheet .NET library. It accepts files from Excel 97-2003, Excel 2007 and newer, and CSV format (semicolon or comma separators).
Example:
var inputFile = new FileInfo("Book1.csv"); // could be .xls or .xlsx too
var sheet = Spreadsheet.Read(inputFile);
foreach (var row in sheet)
{
foreach (var cell in row)
{
// do something
}
}
You can remove beginning and trailing empty rows, and also beginning and trailing columns from the imported data using the Normalize() function:
sheet.Normalize();
Sometimes you can find that your imported data contains empty rows between data, so you can use another helper for this case:
sheet.RemoveEmptyRows();
There is a Serialize() function to convert any input to CSV too:
var outfile = new StreamWriter("AllData.csv");
sheet.Serialize(outfile);
If you like to use comma instead of the default semicolon separator in your CSV file, do:
sheet.Serialize(outfile, ',');
And yes, there is also a ToString() function too...
This package is available at NuGet too, just take a look at TakeIo.Spreadsheet.
You can use ADO.NET
http://vbadud.blogspot.com/2008/09/opening-comma-separate-file-csv-through.html
Well, importing from CSV shouldn't be a big deal. I think the most basic method would be to do it using string operations. You could build a pretty fine parser using simple Split() command, and getting the stuff in arrays.