How to find the data source of a Pivot Table using OpenXML - c#

I am using EPP to open and edit an existing excel document.
The document contains 2 sheets - one with a pivot table (named Pivot) and one with the data (Data!$A$1:$L$9899).
I have a reference to the ExcelPivotTable with the code below, but can't find any properties that relate to the data source.
ExcelPackage package = new ExcelPackage(pivotSpreadsheet);
foreach (ExcelWorksheet worksheet in package.Workbook.Worksheets)
{
if (worksheet.PivotTables.Count > 0)
{
pivotWorkSheetName = worksheet.Name;
pivotTable = worksheet.PivotTables[0];
}
}
How do I get the name and range of the source data? Is there an obvious property that I'm missing or do I have to go hunting through some xml?

PivotTables use a data cache for the data store for performance & abstraction reasons. Remember, you can have a pivot that points to a web service call. The cache itself is what stores that reference. For pivots that refer to data elsewhere in a workbook, you can access it in EPPlus like this:
worksheet.PivotTables[0].CacheDefinition.SourceRange.FullAddress;

If anyone is interested to update the data source with OpenXML SDK 2.5 then here is the code I used.
using (var spreadsheet = SpreadsheetDocument.Open(filepath, true))
{
PivotTableCacheDefinitionPart ptp = spreadsheet.WorkbookPart.PivotTableCacheDefinitionParts.First();
ptp.PivotCacheDefinition.RefreshOnLoad = true;//refresh the pivot table on document load
ptp.PivotCacheDefinition.RecordCount = Convert.ToUInt32(ds.Tables[0].Rows.Count);
ptp.PivotCacheDefinition.CacheSource.WorksheetSource.Reference = "A1:" + IntToLetters(ds.Tables[0].Columns.Count) + (ds.Tables[0].Rows.Count + 1);//Cell Range as data source
ptp.PivotTableCacheRecordsPart.PivotCacheRecords.RemoveAllChildren();//it is rebuilt when pivot table is refreshed
ptp.PivotTableCacheRecordsPart.PivotCacheRecords.Count = 0;//it is rebuilt when pivot table is refreshed
}
public string IntToLetters(int value)//copied from another stackoverflow post
{
string result = string.Empty;
while (--value >= 0)
{
result = (char)('A' + value % 26) + result;
value /= 26;
}
return result;
}

Related

Split large excel into multiple excel file using azure function

I'm able to split excel file in function but when publishing on azure function is giving timeout exception. what to do.how azure durable functions can help here?
This is how i'm doing it:
bookOriginal.LoadFromStream(BlobService.GetFileFromBlob(filename));
log.LogInformation("File read from Azure Blob");
Worksheet sheet = bookOriginal.Worksheets[0];
var totalRow = sheet.Rows.Count();
int splitRows = 7000;
int count = totalRow / splitRows;
for (int i = 1; i <= count; i++)
{
CellRange range1;
Workbook newBook1 = new Workbook();
newBook1.CreateEmptySheets(1);
Worksheet newSheet1 = newBook1.Worksheets[0];
Model localModel = new Model();
if (i == 1)
{
range1 = sheet.Range[2, 1, splitRows, sheet.LastColumn];
}
else
{
range1 = sheet.Range[(splitRows * (i - 1)) + 1, 1, splitRows * i, sheet.LastColumn];
}
newSheet1.Copy(range1, newSheet1.Range[1, 1]);
//bookOriginal.SaveToFile("Research and Development.xlsx", ExcelVersion.Version2007);
localModel.workbookObject = newBook1;
model.Add(localModel);
}
Console.WriteLine("Ran Completely");
Yes durable functions can surely help you!
You can take a look at this link https://learn.microsoft.com/it-it/azure/azure-functions/durable/durable-functions-overview?tabs=csharp
The first and the second pattern could help you. The Project structure can be:
a blob triggered function that downloads the source excel, converting it into one object that you can pass as input invoking the orchestrator .
The orchestrator function Deserializes the input object and groups the rows as you did in your code
inside a foreach statement you can use the current group of rows as parameter to invoke an activity. You can choose if activities will run in sequence (as pattern 1 awaiting activity) or run in parallel (As pattern 2 using Task.WhenAll)
The activity function Converts the row group into an excel File and, using blob attribute as output, uploads it into storage
WARNING: durable Functions documentation sayes: Return values are serialized to JSON and persisted to the orchestration history table in Azure Table storage.
So The input model must be serializable as json.

Linq Update - Cannot perform runtime binding on a null reference

I'm using LinqToExcel along with C# to read data from a MS Excel spreadsheet and then update data records in my MS SQL database.
The Excel file has these headers: COURSE_ID, PROVIDER_COURSE_TITLE
My code is like this:
public class TestDataCourse
{
[ExcelColumn("PROVIDER_COURSE_TITLE")]
public string cTitle
{
get;
set;
}
}
///////////////////////////////
string pathToExcelFile = #"C:\\O_COURSES.xlsx";
ConnexionExcel ConxObject = new ConnexionExcel(pathToExcelFile);
//read data from excel
var query1 = (from a in ConxObject.UrlConnexion.Worksheet<TestDataCourse>("O_COURSES")
select a).Take(2000).ToList();
//Get data from MS SQL database that need updated
var courses = _courseService.GetAllCoursesFromDB().Take(100).ToList();
int count = 0;
foreach (var course in courses)
{
//Iterate through the excel doc and assign the
TestDataCourse fakeData = query1.Skip(count).Take(1).FirstOrDefault();
course.CourseTitle = fakeData.cTitle;
count++;
}
_courseService.Save();
When I run this code I can see that it does update some of the records in my database, but as the code execution continues, I get a Source Not Available tab open within my Visual Studio and a Cannot perform runtime binding on a null reference.
The null reference exception had me thinking that maybe there was some null data in the Excel doc, so I put this line of code into my for loop:
course.CourseTitle = fakeData == null ? "Course Test" : fakeData.cTitle;
But I still get the same problem.
Could anyone please help?
Thanks.

Multiple Pivot tables ClosedXML

Using latest Closed XML (0.76) on Net 4.5.1
Created a Worksheet with a table by:
DataTable Table = ...
var DataWorkSheet = Workbook.Worksheets.Any(x => x.Name == "Data") ?
Workbook
.Worksheets
.First(x => x.Name == "Data") :
Workbook
.Worksheets
.Add("Data");
int Start = ... // calculate cell start
var Source = DataWorkSheet
.Cell(Start, 1)
.InsertTable(Table, Name, true);
var Range = Source.DataRange;
This is done inside a loop (i.e. multiple tables in the "Data" sheet). A problem arises where the generated Excel document can't be opened if multiple pivot tables are created in a separate sheet.
var PivotWorkSheet = Workbook
.Worksheets
.Add(Name);
var Pivot = PivotWorkSheet
.PivotTables
.AddNew(Name, PivotWorkSheet.Cell(1, 1), DataRange);
Any ideas why and how to debug?
This is the same issue as in ClosedXML - Creating multiple pivot tables.
For the record, it's caused by ClosedXML bug which requires source code modification as in my answer of the linked question.

OpenXML - after xlsx edit Excel detects errors in file

I have xlsx file with pivot table and some filter (pivot field), I’m trying to use OpenXML to:
Open the file
Modify pivot field setting
Save the file
I’m using this simple (and ugly) code to do the job:
OpenSettings settings = new OpenSettings()
{
MarkupCompatibilityProcessSettings = new MarkupCompatibilityProcessSettings(MarkupCompatibilityProcessMode.ProcessAllParts, FileFormatVersions.Office2010)
};
SpreadsheetDocument spd = SpreadsheetDocument.Open(pathToFile, true, settings);
var pivotTableCacheDefinitionParts = spd.WorkbookPart.PivotTableCacheDefinitionParts;
foreach (PivotTableCacheDefinitionPart item in pivotTableCacheDefinitionParts)
{
var pivotCacheDefinition = item.PivotCacheDefinition;
var d = pivotCacheDefinition.CacheFields.Where(x => (x as CacheField).Caption == "Some filter from Excel");
foreach(var item2 in d)
{
if (item2.InnerXml.Contains("Some filter value"))
{
var a1 = item2.InnerXml.Replace("><", ">\n<").Split('\n');
var a2 = a1.Where(x => !x.Contains("Some filter value"));
string a3 = "";
foreach (var item3 in a2)
{
a3 += item3;
}
a3=a3.Replace("count=\"2\"","count=\"1\"");//There are two values to choose from currently
item2.InnerXml = a3;
}
}
}
After I save my document using:
spd.WorkbookPart.Workbook.Save();
spd.Close();
Excel claims file is damaged and will attempt to repair it… I tried using other libraries but:
ClosedXML - it doesn’t see any data in pivot tables (perhaps because OLAP is used as the data source? I don’t know)
ExcelDataReader - doesn’t seems to support pivot tables, or support them only partially
EPPlus (beta version - stable didn’t work with my xlsx file) - it doesn’t seem to support edit of the pivotal fields (pivot table filters)
MS.Office.Interop.Excel - this works (mostly), but since we want to use this functionality on the server side, it is not recommended solution
What am I doing wrong?

How to read rows from Lucene.Net's index files

I am using Lunece.net 2.0.5 version.
I want to open and display all the records in the index file in a grid (table) format in an ASP.NET web application, and also provide edit option for each cell in that grid.
But I don't know how to read each row from Index file.
I used code below-
private List<String> GetIndexTerms(string indexFolder)
{
List<String> termlist = new List<string>();
IndexReader reader = IndexReader.Open(indexFolder, false);
TermEnum terms = reader.Terms();
while (terms.Next())
{
Term term = terms.Term();
String termText = term.Text();
int frequency = reader.DocFreq(term);
termlist.Add(termText);
}
reader.Close();
return termlist;
}
but it returns list of each term and here I am unable to aggregate data by each row (record).
Let me know if there is way to read file by each row or I need to update version of Lucene that I am currently using.
Also please provide any links to Lucene.net's better documentation websites.
You can read all the records/rows (documents in Lucene terminology) directly from the index without searching
var reader = IndexReader.Open(dir);
for (int i = 0; i < reader.MaxDoc(); i++)
{
if (reader.IsDeleted(i)) continue;
Document d = reader.Document(i);
var fieldValuePairs = d.GetFields()
.Select(f => new {
Name = f.Name(),
Value = f.StringValue() })
.ToArray();
}
PS: v2.0.5 is very old. try latest & greatest Lucene.Net

Categories

Resources