Parsing cell value of Excel spreadsheet - c#

I parse cell located in A2 address. This returns 3 value instead of the expected Category 1.
test.xlsx
using System;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using X = DocumentFormat.OpenXml.Spreadsheet;
namespace DotNetSandbox.SO
{
public class IncorrectCellValue
{
public static void ParseCellValue()
{
using SpreadsheetDocument doc = SpreadsheetDocument.Open(#"c:\temp\test.xlsx", false);
X.Sheet sheet = doc.WorkbookPart.Workbook.Descendants<X.Sheet>().First();
WorksheetPart wsPart = (WorksheetPart)doc.WorkbookPart.GetPartById(sheet.Id);
X.Cell cell = wsPart.Worksheet.Descendants<X.Cell>().First(c => c.CellReference == "A2");
string cellValue = cell.CellValue.Text;
Console.WriteLine(cellValue);
Console.ReadKey();
}
}
}
OUTPUT:
3
Target: .NET 5
DocumentFormat.OpenXml version: 2.13.0
I do something wrong or maybe is it a library bug?

Use this method
public static string GetCellValue(string fileName,
string addressName, string sheetName = "")
{
string value = null;
// Open the spreadsheet document for read-only access.
using (SpreadsheetDocument document =
SpreadsheetDocument.Open(fileName, false))
{
// Retrieve a reference to the workbook part.
WorkbookPart wbPart = document.WorkbookPart;
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
var theSheets = wbPart.Workbook.Descendants<Sheet>();
Sheet theSheet = string.IsNullOrEmpty(sheetName) ? theSheets.FirstOrDefault() : theSheets.FirstOrDefault(x => x.Name == sheetName);
// Throw an exception if there is no sheet.
if (theSheet == null)
{
throw new ArgumentException("sheetName");
}
// Retrieve a reference to the worksheet part.
WorksheetPart wsPart =
(WorksheetPart)(wbPart.GetPartById(theSheet.Id));
// Use its Worksheet property to get a reference to the cell
// whose address matches the address you supplied.
Cell theCell = wsPart.Worksheet.Descendants<Cell>().
Where(c => c.CellReference == addressName).FirstOrDefault();
// If the cell does not exist, return an empty string.
if (theCell.InnerText.Length > 0)
{
value = theCell.InnerText;
// If the cell represents an integer number, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and
// Booleans individually. For shared strings, the code
// looks up the corresponding value in the shared string
// table. For Booleans, the code converts the value into
// the words TRUE or FALSE.
if (theCell.DataType != null)
{
switch (theCell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the
// shared strings table.
var stringTable =
wbPart.GetPartsOfType<SharedStringTablePart>()
.FirstOrDefault();
// If the shared string table is missing, something
// is wrong. Return the index that is in
// the cell. Otherwise, look up the correct text in
// the table.
if (stringTable != null)
{
value =
stringTable.SharedStringTable
.ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
switch (value)
{
case "0":
value = "FALSE";
break;
default:
value = "TRUE";
break;
}
break;
}
}
}
}
return value;
}
You stuck here:
If the cell represents an integer number, you are done.
For dates, this code returns the serialized value that
represents the date. The code handles strings and
Booleans individually. For shared strings, the code
looks up the corresponding value in the shared string
table. For Booleans, the code converts the value into
the words TRUE or FALSE.
I was able to get Category 1 by running this code:
var cellValue = GetCellValue(#"c:\test.xlsx", "A2");
Microsoft Doc
Notice that I changed the original method to get the first sheet if you do not pass the sheet name to the method.
What is Shared String:
To optimize the use of strings in a spreadsheet, SpreadsheetML stores a single instance of the string in a table called the shared string table. The cells then reference the string by index instead of storing the value inline in the cell value. Excel always creates a shared string table when it saves a file.

Related

In C# how do I go through a Google Sheets document and write into a specific cell

Following a tutorial, I have set up everything that needs to be set up for Google Sheets Api v4. In my Google Sheets documetnt, I have names of students in the first column, and in my second column I want to put their GPA. In my code, I made two variables that the user inputs, string name and string gpa. I want to go through column A, look for that name and insert that GPA next to it. I know I should probably use a for loop to go through the column, and compare every cell with the string the user typed, but nothing I tried so far worked.
I wrote a simple method that can get entries, for now it only prints but that can easily be changed:
static void ReadEntries()
{
var range = $"{sheet}!A1:F10";
var request = service.Spreadsheets.Values.Get(SpreadsheetId, range);
var response = request.Execute();
var values = response.Values;
if(values != null && values.Count > 0)
{
foreach(var row in values)
{
Console.WriteLine("{0} | {1}", row[0], row[1]);
}
}
else
{
Console.WriteLine("No data found");
}
}
and a method that can update a specific cell:
static void UpdateEntry()
{
var range = $"{sheet}!B2"; //example
var valueRange = new ValueRange();
var objectList = new List<object>() { "updated" };
valueRange.Values = new List<List<object>> { objectList };
var updateRequest = service.Spreadsheets.Values.Update(valueRange, SpreadsheetId, range);
updateRequest.ValueInputOption = SpreadsheetsResource.ValuesResource.AppendRequest.ValueInputOptionEnum.USERENTERED;
var updateResponse = updateRequest.Execute();
}
EDIT: I need help with making a for loop to go through my A column and find the student with the same name. I know how to update a cell. I just don't know how to find a cell that needs updating.
Sounds like you are very close. You already have the value you are searching in row[0] in the loop, so all you need to track the row number through your loop.
if (values != null && values.Count > 0)
{
int rowNo =0;
foreach (var row in values)
{
rowNo ++;
Console.WriteLine("{0} | {1}", row[0], row[1]);
if (row[0].ToString() == "John")
{
string rangeToUpdate = $"{sheet}!B{rowNo}:B{rowNo}";
...
}
}
}
You could also change from using a foreach to a standard for loop.
I'm not experienced in the .NET client library of the Sheets API.
However, having used the Sheets API with the node and python client libraries, I can point you to the documentation you should follow. This is the official API documentation, with code examples for each language having a Google-provided client library.
For example, here is the spreadsheets.values.update documentation that you use, with a code example for C#.
On to the question then:
According to the json representation of a ValueRange, ValueRange.Range does not seem optional even though it is redundant. You might need to add ValueRange.Range = range; in your code.
Plus, you are using SpreadsheetsResource.ValuesResource.AppendRequest instead of SpreadsheetsResource.ValuesResource.UpdateRequest in the definition of your ValueInputOption.
Let me know if it helped!
Update
This also seems to be a duplicate of Update a cell with C# and Sheets API V4

Handling large Excel files with shared strings

Using OpenXML, Microsoft recommends using the SAX approach:
https://msdn.microsoft.com/en-us/library/office/gg575571.aspx
So rather than loading the whole document DOM in memory, you can read the file serially with OpenXmlReader. For example:
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
string text;
while (reader.Read())
{
if (reader.ElementType == typeof(CellValue))
{
text = reader.GetText();
Console.Write(text + " ");
}
}
But this kinda falls down when you have cells with the SharedString data type. Those are stored separate from the sheet data in the shared string table and, as far as I can see, there's no real way to avoid having to load the entire shared string table. For example, I can do this:
var sharedStrings = wbPart.SharedStringTablePart.SharedStringTable.Cast<SharedStringItem>()
.Select(i => i.Text.Text).ToArray();
And then I can do something like:
var row = reader.LoadCurrentElement() as Row;
var cells = row.Descendants<Cell>();
var cellValues = cells.Select(c => c.DataType != null
&& c.DataType == CellValues.SharedString ?
sharedStrings[int.Parse(c.CellValue.Text)] : c.CellValue.Text).ToArray();
Which works, but I had to load the entire shared string table, which could be very large if the file has a lot of unique strings. Is there a more efficient way handle looking up the shared strings as your process each row of the file?

DataTable.Select throws EvaluateException: Cannot perform '=' operation on System.Double and System.String

I am importing data from multiple spread sheets into a single database table.
To start I've loaded the entire excel file into a data set, 1 page = 1 table. I then looped over all the pages and added the columns to the new combination table. Finally, I will take each row, find the corresponding row in the other pages and copy the data into a new row in the new table. I need to use a combination of 3 columns to perform this match: 'Brand', 'Model' and 'Yr'. Here is the loop for the last step.
//Add Data
foreach (DataRow drBase in tableSet.Tables[0].Rows)
{
List<DataRow> drSelect = new List<DataRow>(); //selected rows for specific bike
foreach(DataTable dt in tableSet.Tables)
{
string expression="";
foreach (string colName in joiningCols)
{
if (drBase[colName].ToString() == "") break;
if (!string.IsNullOrWhiteSpace(expression))
{
expression += " and ";
}
expression += string.Format("{0} = '{1}'",colName, drBase[colName].ToString().Trim());
}
DataRow[] temp= { };
if (!string.IsNullOrWhiteSpace(expression))
{
temp = dt.Select(expression); //This is the line throwing the exception
}
if (temp.Length == 1)
{
drSelect.Add(temp[0]);
//Debug.Print(string.Format("Match found {0} on {1}", expression, dt.TableName));
}
else
{
Debug.Print(string.Format("Incorrect number of matches ({2}) to <{1}> on Table[{0}]", dt.TableName, expression, temp.Length));
continue;
}
}
if (drSelect.Count == 2)
{
DataRow current = resultTable.NewRow();
for (int t = 0; t < tableSet.Tables.Count; t++)
{
foreach (DataColumn c in tableSet.Tables[t].Columns)
current[c.ColumnName] = drSelect[t][c.ColumnName];
}
resultTable.Rows.Add(current);
}
}
The exception is:
EvaluateException was unhandled
An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll
Additional information: Cannot perform '=' operation on System.Double and System.String.
The value of 'expression' during the exception is "Brand = 'BMW' and Yr = '1997-2000' and Model = 'F 650'"
The error, and my research, say that I should enclose all the values as strings, which I've done. None of the columns in excel use a special format, so all should default to text. The only column that may contain only numbers is the year, but since it was able to do several iterations before stopping I do not believe the error is pointing to another row.
After some testing, where I broke up the expression into parts (A, B and C) it only crashed when selecting on A and C ("Brand = 'BMW' and Yr = '1997-2000'") Not when I select by each clause individually.
What am I missing? Where is this double it is trying to compare?
Try
"Yr >= 1997 and Yr =< 2000"
You're still fighting a "guess the datatype" battle.
I still suggest the transform to an ICollection<MyHolderObject>
"Excel File"
Treat everything as a string. I put my results into a simple class, with "String" properties that are settable..and then have a corresponding "get" property with the correct datatype.
Then I can run queries/filters against it.
public class ExcelImportRow
{
public string SalaryString { get; set; }
/* if zero is ok */
public double Salary
{
get
{
double returnValue = 0.00D;
if (!string.IsNullOrEmpty(this.SalaryString))
{
double.TryParse(this.SalaryString, out returnValue);
}
return returnValue;
}
}
public string TaxRateString { get; set; }
/* if zero is not ok */
public decimal? TaxRate
{
get
{
decimal? returnValue = null;
if (!string.IsNullOrEmpty(this.TaxRateString))
{
decimal tryParseResult;
if (decimal.TryParse(this.TaxRateString, out tryParseResult))
{
returnValue = tryParseResult;
}
}
return returnValue;
}
}
}
Excel is not a database. In Excel any worksheet cell can contain any type of data.
I'm not familiar with Excel Data Reader, but it likely uses the same sort of fuzzy logic as the Excel ODBC/OLE drivers, i.e.: there is a RowsToScan parameter that tells the driver how many non-heading rows it should scan to guess a data type for each column, the default being 1 row.
REF: Excel ODBC Driver May Determine Wrong Data Type
What's probably happening with your Excel file is that the first few rows contain numeric data in the Yr column, so Excel Data Reader is inferring the Yr data type to be Decimal (and it/something else is setting that data type on your DataTable's Yr column). When you get to the row containing '1997-2000' that value cannot be converted to a Decimal, hence your exception: the DataTable[Yr] column is of type Decimal, your comparison value is of type String.

Check all values of specific column to define the datatype in DGV

I'm asking your help to know if exists a fast method to check if all the values in a specific column of a DataTable/Datagridview are DateTime or numbers.
I'm trying to make a generic method to put specific formats in any column in a DGV.
I have information from TEXT files / Excel or XML files without previously data type definition
Thanks!
You can bury the loop in an extension method. The end result will need a loop somewhere, though, even if the loop is hidden inside Linq operations. For example, you could write this extension method:
public static void ApplyColumnFormatting(this System.Data.DataTable table, string column, Action formatDateTime, Action formatNumeric)
{
bool foundNonDateTime = false;
bool foundNonNumeric = false;
DateTime dt;
Double num;
foreach (System.Data.DataRow row in table.Rows)
{
string val = row[column] as string;
// Optionally skip this iteration if the value is not a string, depending on your needs.
if (val == null)
continue;
// Check for non-DateTime, but only if we haven't already ruled it out
if (!foundNonDateTime && !DateTime.TryParse(val, out dt))
foundNonDateTime = true;
// Check for non-Numeric, but only if we haven't already ruled it out
if (!foundNonNumeric && !Double.TryParse(val, out num))
foundNonNumeric = true;
// Leave loop if we've already ruled out both types
if (foundNonDateTime && foundNonNumeric)
break;
}
if (!foundNonDateTime)
formatDateTime();
else if (!foundNonNumeric)
formatNumeric();
}
Then you can call it like this:
System.Data.DataTable table = ...;
table.ApplyColumnFormatting("Column_Name",
() => { /* Apply DateTime formatting here */ },
() => { /* Apply Numeric formatting here */ }
);
This is fast in the sense that it does not check any more rows than necessary, and it does not continue checking a given type after it has been ruled out.

Get Tables (workparts) of a sheet of excel by OpenXML SDK

I have 3 tables in a sheet of excel file,
and I use OpenXML SDK to read the Excel file, like this:
SpreadSheetDocument document = SpreadSheetDDocument.open(/*read it*/);
foreach(Sheet sheet in document.WorkbookPart.Workbook.Sheets)
{
//I need each table or work part of sheet here
}
So as you see I can get each sheet of Excel, but how can I get workparts in each sheet, like my 3 tables I should can iterate on these tables, does any one know about this? any suggestion?
Does this help?
// true for editable
using (SpreadsheetDocument xl = SpreadsheetDocument.Open("yourfile.xlsx", true))
{
foreach (WorksheetPart wsp in xl.WorkbookPart.WorksheetParts)
{
foreach (TableDefinitionPart tdp in wsp.TableDefinitionParts)
{
// for example
// tdp.Table.AutoFilter = new AutoFilter() { Reference = "B2:D3" };
}
}
}
Note that the actual cell data is not in the Table object, but in SheetData (under Worksheet of the WorksheetPart). Just so you know.
You can get the specific table from excel. Adding more to the answer of #Vincent
using (SpreadsheetDocument document= SpreadsheetDocument.Open("yourfile.xlsx", true))
{
var workbookPart = document.WorkbookPart;
var relationsShipId = workbookPart.Workbook.Descendants<Sheet>()
.FirstOrDefault(s => s.Name.Value.Trim().ToUpper() == "your sheetName")?.Id;
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationsShipId);
TableDefinitionPart tableDefinitionPart = worksheetPart.TableDefinitionParts
.FirstOrDefault(r =>
r.Table.Name.Value.ToUpper() =="your Table Name");
QueryTablePart queryTablePart = tableDefinitionPart.QueryTableParts.FirstOrDefault();
Table excelTable = tableDefinitionPart.Table;
var newCellRange = excelTable.Reference;
var startCell = newCellRange.Value.Split(':')[0]; // you can have your own logic to find out row and column with this values
var endCell = newCellRange.Value.Split(':')[1];// Then you can use them to extract values using regular open xml
}

Categories

Resources