C# - Using OpenXML how to read blank spaces on excel files - c#

I was searching for some solutions but failed to find one, currently I have a problem on reading an excel file using OpenXML. With perfect data, there won't be any problem, but with data with blanks, the columns seems to be moving to the left, producing an error saying that the index was not right since it actually moved to the left. I found a solution wherein you can place in cells in between, but when I tried it, an error saying that an object reference was not set to an instance of an object while reading the certain cell with this code (source is from the answer in here for inserting cells How do I have Open XML spreadsheet "uncollapse" cells in a spreadsheet?)
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else if (cell == null)
{
return null;
}
else
{
return value;
}
}
any other ways wherein I can read blank cells as blank without moving the data to the left?
All help will be appreciated! :)
Thanks!

In Open XML, xml file does not contain an entry for the blank cell that's why blank cells are skipped. I faced the same problem. The only solution is apply some logic.
For Example:
When we read a cell we can get its ColumnName (A,B,C etc.) by the following code
string cellIndex = GetColumnName( objCurrentSrcCell.CellReference );
where
public static string GetColumnName(string cellReference)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
you can store these cells in a Hashtable where key can be the cell ColumnName and value can be the object of the cell. And when writing fetch cells from the hash object serially on some basis or your logic like...
you may loop from A to Z and read the cells at particular key like
if(objHashTable.Contains(yourKey))
{
Cell objCell = (Cell) objHashTable[yourKey];
//Insertcell or process cell
}
else
{
//do process for the empty cell like you may add a new blank cell
Cell objCell = new Cell();
//Insert cell or process cell
}
This is the only way to work with open xml. adding a blank cell during reading is a waste of time. You can add more logic according to you
try this. this will definitely work. or if you find a better solution, do tell me
Have a nice day :)

Related

How to iterate throgh a specific row in Excel table via Interop?

So, I'm writing a program that is reads table data and puts cells values in a List. I made it, but there is one problem – UsedRange takes all cells on sheet so there is more items then I need and also, when I specify range by ["A:A", Type.Missng] it gives me an exception:
System.ArgumentException: "HRESULT: 0x80070057 (E_INVALIDARG))"
So my question is how to make it correctly?
Code is:
foreach (Excel.Range row in usedRange)
{
for(int i=0; i<lastCell.Row; i++)
{
if (row.Cells[4, i + 1].Value2 != null)
{
personlist.Add(Convert.ToString(row.Cells[4, i + 1].Value2));
}
else { i++; }
}
foreach(var person in personlist) {
Console.WriteLine(person);
}
}
UPD: I need a last used row, that's why I'm using UsedRange. So if there is any alternatives, like, checking if(!=null)? I will gladly try it
Tried to give it specific range, some tries to made a code like here C# - How do I iterate all the rows in Excel._Worksheet?
and here
https://overcoder.net/q/236542/программно-получить-последнюю-заполненную-строку-excel-с-помощью-c
but maybe I'm a dumb one, 'cause there is literally more than one articles about it and non of it works with me
The problem is 'used range' can include empty range (who knows how excel decides that magic number - if you type a letter on some arbitrary row and then delete it Excel can decide that cell is still part of your used range). You want your own custom definition of what a 'usedRange' is, which presumably is the range of non-blank rows. There's two straightforward ways of implementing this yourself (which gives you added control over it should you want to customize it).
You can just filter the list after the fact removing all blank entries. Or you can process the list in reverse, skipping rows till you find one matching your criteria
bool startProcessing = false;
for(int i=lastCell.Row-1; i>=0; i--)
{
if(!startProcessing){//bool is in case you want blank rows in the middle of the file, otherwise check valid row always
//check if valid row
//continue; if not, set startProcessing to true if yes
}
if (row.Cells[4, i + 1].Value2 != null)
{
personlist.Add(Convert.ToString(row.Cells[4, i + 1].Value2));
}
//else { i++; } //this is a bug, will cause a line skip
}
Also, as an aside - when you call i++; in the body of your for loop, it then calls it again in the header of your for loop and i += 2 skipping a row. Use continue; or just remove the else block altogether.
There's probably a way to get a cellRange matching your criteria, but imo doing it yourself can be better - you can ensure it does exactly what you want.

Copy text and preserve formatting from Microsoft Word to Excel using C#

I am currently working on a project that converts a test from a "standard" word format to a format that is accepted by the new Saras program we are using.
I have been able to parse the file and gather the information that I want. Unfortunately there is a problem with the formatting, I only know how to parse and insert plain text.
A few options I have found so far include using the .Copy() method and then using the .Paste() method on the cell. I am using a class to hold the data intermittently so that if another parsing algorithm needs to be used for the next person's "standard" format, then the data will still be able to be put into the excel document the same way and the developer only needs to worry about parsing the new format.
NOTE: If there is a way to paste this into a format (like a stand alone cell) and then set the cell or cell value to it, that would be great.
Another option that I have found is using the interop.word.range.formattedtext property. This would work well if I was going to go from one word document to another, but I need to somehow get this formatted text over to an excel cell.
My current thought process would be to take the text and put it into a RTF object. Later I would take it and put it into excel by inserting the RTF.text into the cell and use the RTF.format to format each character. This seems like more work than I should have to do.
Please let me know what you all come up with!
Here is some code:
STORAGE:
class Question
{
public string text;
private List<string> responses;
public Question(string txt, List<string> r)
{
text = txt;
responses = r;
}
public List<string> getResponses() { return responses; }
}
PARSING:
if (p < MAX && docs.Paragraphs[p].Range.ListFormat.ListValue != 0)
{
// question main line
qTxt = docs.Paragraphs[p].Range.Text.ToString();
p++;
// question pos responses NOTE: THIS WILL BE USED TO DO MULTIPLE CHOICE LATER
for (; p < MAX && docs.Paragraphs[p].Range.ListFormat.ListValue == 0; p++)
{
tmp = docs.Paragraphs[p].Range.Text.ToString();
if (tmp != null && tmp != "\r")
qTxt += " \r\n " + docs.Paragraphs[p].Range.Text.ToString();
}
}
SET THE VALUE:
foreach (Question question in questionList)
{
...
// ITEM TEXT
ws.Cells[row, 15].Value2 = question.text;
...
}
Again, this code works for setting plain text, I just want to know how to get the formatting in there as well.
Thanks in advance!
UPDATE:
I figured out a way to make the Copy and Paste work. Silly me, I can just keep the excel document open the entire time. The constructor now sets up the first row and the parseDocument function will then fill in the rows with the data.
THE STRUGGLE:
I am currently using the Copy and Paste functions, but it seems to be putting an image of the text into my document rather than the formatted text itself.
CODE:
// Get the title
Word.Range rng = docs.Paragraphs[p].Range.Duplicate;
int index = 0;
for (; p < MAX && docs.Paragraphs[p].Range.ListFormat.ListValue == 0; p++)
{
string tmp = docs.Paragraphs[p].Range.Text.ToString();
if (tmp != null && tmp != "\r")
++index;
}
rng.MoveEnd(Word.WdUnits.wdParagraph, index);
rng.Copy();
xlDoc.ws.Range["A1", "A1"].PasteSpecial(); // Pastes an image of the title
Alas, I am looking for any way around this possible. Please let me know if you have any solutions.
Thanks!!
PS. I will keep updating this post if I make any progress.

How to get format type of cell using c# in spreadsheetlight

I am using spreadsheetlight library to read Excel sheet(.xslx) values using c#.
I can read the cell value using following code
for (int col = stats.StartColumnIndex; col <= stats.EndColumnIndex; col++)
{
var value= sheet.GetCellValueAsString(stats.StartRowIndex, col); //where sheet is current sheet in excel file
}
I am getting the cell value. But how can I get the data type of the cell? I have checked in documentation but didn't find the solution.
Note: For .xls type of excel files i am using ExcelLibrary.dll library where i can easily get the datatype of cells using below code
for (int i = 0; i <= cells.LastColIndex; i++)
{
var type = cells[0, i].Format.FormatType;
}
but there is no similar method in spreadsheetlight.
Here is the answer from the developer Vincent Tang after I asked him as I wasn't sure how to use DataType:
Yes use SLCell.DataType. It's an enumeration, but for most data, you'll be working with Number, SharedString and String.
Text data will be SharedString, and possibly String if the text is directly embedded in the worksheet. There's a GetSharedStrings() or something like that.
For numeric data, it will be Number.
For dates, it's a little tricky. The data type is also Number (ignore the Date enumeration because Microsoft Excel isn't using it). For dates, you also have to check the FormatCode, which is in the SLStyle for the SLCell. Use the GetStyles() to get a list. The SLCell.StyleIndex gives you the index to that list.
For example, if your SLCell has cell value "15" and data type SharedString, then look for index 15 in the list of shared strings. If it's "blah" with String data type, then that's it.
If it's 56789 with Number type, then that's it.
Unless the FormatCode is "mm-yyyy" (or some other date format code), then 56789 is actually the number of days since 1 Jan 1900.
He also recommended using GetCellList() in order to obtain the list of SLCell objects in the sheet. However, for some reason that function was not available in my version of SL, so I used GetCells() instead. That returns a dictionary of SLCell objects, with keys of type SLCellPoint.
So for example to get the DataType (which is a CellValues object) of cell A1 do this:
using (SLDocument slDoc = new SLDocument("Worksheet1.xlsx", "Sheet1")) {
slCP = SLCellPoint;
slCP.ColumnIndex = SLConvert.ToColumnIndex("A"); ///Obviously 1 but useful function to know
slCP.RowIndex = 1;
CellValues slCV = slDoc.GetCells(slCP).DataType;
}
By the way, I also had a problem with opening the chm help file. Try this:
Right Click on the chm file and select properties
Click on Unblock button at the bottom of the General Tab
To get the value of the cell try following code
var cellValue = (string)(excelWorksheet.Cells[10, 2] as Excel.Range).Value;
Use this link for more details
Check out the SLCell.DataType Property. Spreadsheetlight documentation mentions that this returns the Cell datatype, in class Spreadsheetlight.SLCell
public CellValues DataType { get; set; }
PS: On a side note, I figured out how to open the chm documentation. Try opening the chm file in Winzip, it opens without any issues.
Hope it helps. Thanks
Well, After a lot of trail and error methods i got to find a solution for this.
Based on the formatCode of a cell we can decide the formatType of the cell.
Using GetCellStyle method we can get the formatcode of the cell. Using this formatCode we can decide the formatType.
var FieldType = GetDataType(sheet.GetCellStyle(rowIndex, columnIndex).FormatCode);
private string GetDataType(string formatCode)
{
if (formatCode.Contains("h:mm") || formatCode.Contains("mm:ss"))
{
return "Time";
}
else if (formatCode.Contains("[$-409]") || formatCode.Contains("[$-F800]") || formatCode.Contains("m/d"))
{
return "Date";
}
else if (formatCode.Contains("#,##0.0"))
{
return "Currency";
}
else if (formatCode.Last() == '%')
{
return "Percentage";
}
else if (formatCode.IndexOf("0") == 0)
{
return "Numeric";
}
else
{
return "String";
}
}
This method worked for 99% of the cases.
Hope it helps you.

Get textbox value from Excel cell

A user has provided me with an Excel document that has textboxes in a few of the cells. I have the usual setup code to load the Excel application, get the worksheet, and then start iterating the used range. When I try to get the value of the cell that contains the textbox, the value is null.
foreach (Range row in usedRange.Rows) {
object[,] valueArray = (object[,])row.get_Value(XlRangeValueDataType.xlRangeValueDefault);
var value = valueArray[1,10]; // This is null for textbox cells
}
Is there a special method I should use to get the value of the textbox that appears in an Excel worksheet?
Edit with fix and explanation
Stewbob's suggestion of iterating the shapes got me in the right direction. But using the following code, I was getting null exceptions:
for (int i=1; i<shapes.Count;i++){
var item = shapes.Range[i].Item(1);
string myString = item.TextFrame2.TextRange.Characters.Text.ToString();
}
After looking at the object in Quickwatch, I noticed something odd about the shape. It was of type msoOLEControlObject. It turns out the values on this Excel document are cut and pasted into Excel from a webpage. Excel was not creating textboxes but OLE boxes. The OLE box did have a 'Value' property so I could access the textboxes value as such:
var shapes = ws.Shapes;
for (int i=1; i<shapes.Count;i++){
var item = shapes.Range[i].Item(1);
var myText = item.OLEFormat.Object;
if (myText.Object != null) {
if (myText.Object.Value != null) {
Console.WriteLine(myText.Object.Value.ToString());
}
}
}
So make sure if you are dealing with pasted objects that you check the value property and not the TextRange property.
If you know the name of the Text Box, you can reference it this way:
ActiveSheet.Shapes.Range(Array("TextBox 1")).Select
If you don't know the name, you can use ActiveSheet.Shapes to iterate through all the shapes on the worksheet.
Getting to the actual text in the TextBox is not very straightforward in VBA. The following code iterates through all the Shape objects on the active worksheet:
Dim shp As Shape
Dim myText As String
For Each shp In ActiveSheet.Shapes
myText = shp.TextFrame2.TextRange.Characters.Text
Next
Though I see that you are working in C#, so the above code will be a little different, but it at least gives you the object model to get to the text inside the TextBox.
Well, the TextBox isn't actually inside any cell (although it may appear to be).
Instead, you have to get it from the Shapes collection in the WorkSheet.

Read data from combined Excel columns/rows using C#

I'm trying to read data from an Excel document in C# using Microsofts COM Interop.
So far, I'm able to load the document and read some data from it. However, I need to read data from two different columns and output these as json (for a jquery ajax call)
I've made a quick prototype of how my Excel document is structured with the hope that it's a bit easier to explain ;-)
The method I have is called GetExcelDataByCategory(string categoryName) where the categoryName parameter would be used to find which column to get the data from.
So, i.e., if I'm making the call with "Category 2" as parameter, I need to get all the values in the C columns rows and the corresponding dates from the A column, so the output will look like this:
Which then needs to be transformed/parsed into JSON.
I've searched high and low on how to achieve this, but with no luck so far :-( I'm aware that I can use the get_Range() method to select a range, but it seems you need to explicitly tell the method which row and which column to get the data from. I.e.: get_Range("A1, C1")
This is my first experience with reading data from an Excel document, so I guess there's a lot to learn ;-) Is there a way to get the output on my second image?
Any help/hint is greatly appreciated! :-)
Thanks in advance.
All the best,
Bo
This is what I would do:
using Excel = Microsoft.Office.Interop.Excel;
Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlWorkbook = xlApp.Workbooks.Open("path to book");
Excel.Worksheet xlSheet = xlWorkbook.Sheets[1]; // get first sheet
Excel.Range xlRange = xlSheet.UsedRange; // get the entire used range
int numberOfRows = xlRange.Rows.Count;
int numberOfCols = xlRange.Columns.Count;
List<int> columnsToRead = new List<int>();
// find the columns that correspond with the string columnName which
// would be passed into your method
for(int i=1; i<=numberOfCols; i++)
{
if(xlRange.Cells[1,i].Value2 != null) // ADDED IN EDIT
{
if(xlRange.Cells[1,i].Value2.ToString().Equals(categoryName))
{
columnsToRead.Add(i);
}
}
}
List<string> columnValue = new List<string>();
// loop over each column number and add results to the list
foreach(int c in columnsToRead)
{
// start at 2 because the first row is 1 and the header row
for(int r = 2; r <= numberOfRows; r++)
{
if(xlRange.Cells[r,c].Value2 != null) // ADDED IN EDIT
{
columnValue.Add(xlRange.Cells[r,c].Value2.ToString());
}
}
}
This is the code I would use to read the Excel. Right now it reads every column that has the heading (designated by whatever is in the first row) and then all the rows there. It isn't exactly what you asked (it doesn't format into JSON) but I think it is enough to get you over the hump.
EDIT: Looks like there are a few blank cells that are causing problems. A blank cell will be NULL in the Interop and thus we get errors if we try to call Value2 or Value2.ToString() since they don't exist. I added code to check to make sure that the cell isn't null before doing anything with it. It prevent the errors.
for Excel-parsing and creation you can use ExcelDataReader: http://exceldatareader.codeplex.com/
and for json you can use json.net: http://json.codeplex.com/
Both are fairly easy to use. Just have a look at the websites.

Categories

Resources