Handling sheet changes in the Google Sheets API - c#

I'm trying to handle Spreadsheet changes for updating local version of data, and got some troubles:
Google Sheets API does not have any request for checking last modified time or file version. (Correct me, if i mistake)
Google need some time to handle changes and update version metadata of the file.
For example:
File is in version 10.
Sending BatchUpdateRequest with some data
By the end of the previous request checking file version by DriveAPI Files.Get request with field "version" and get old version 10
If wait about 15 seconds this request returns correct data, but it's not a solution because data updates every minute for each spreadsheet, so, it will spends a lot more time.
To overcome these problems i have realized logic with local calculating of Spreadsheet version, and comparing it after uploading: if online version > local version spreadsheet will be reload. But it creates new problem:
If make changes to Spreadsheet from several computers at a few moment, local version on all computers will be increased, but Google concats this changes into one version. So, for correct working it must be oldversionnubmer + countOfComputersThatMakesChanges but in fact it is oldVersionNumber + 1. Thus no one will get actual spreadsheet data because online version will not be higher than local.
In this way i have a question: How can I updates spreadsheets on changing data from another source?
GoogleSpreadsheetsVersions filling like that:
var versions = Instance.GoogleSpreadsheetsVersions;
if (!versions.ContainsKey(newTable.SpreadsheetId)) {
var request = GoogleSpreadsheetsServiceDecorator.Instance.DriveService.Files.Get(newTable.SpreadsheetId);
request.Fields = "version";
var response = request.Execute();
versions.Add(newTable.SpreadsheetId, response.Version);
}
version comparison itself:
var newInfo = new Dictionary<string, long?>();
foreach (var info in GoogleSpreadsheetsVersions)
{
try
{
//Gets file version
var request = GoogleSpreadsheetsServiceDecorator.Instance.DriveService.Files.Get(info.Key);
request.Fields = "version";
var response = request.Execute();
// local version < actual google version
if (info.Value < response.Version)
{
// setting flag of reloading for each sheet from this file
foreach (var t in GoogleSpreadsheets.Where(sheet => sheet.SpreadsheetId == info.Key))
t.IsLoadRequestRequired = true;
}
//Refreshing local versions
newInfo.Add(info.Key, response.Version);
}
catch (Exception e) when (e.Message.Contains("File not found"))
{
newInfo.Add(info.Key, null);
}
}
GoogleSpreadsheetsVersions = newInfo;
P.S.:
version field description from Google guide:
A monotonically increasing version number for the file. This reflects every change made to the file on the server, even those not visible to the user.
Local class of Spreadsheet in code means data of one sheet in Google. So, if one Google spreadsheet contains 10 sheets it will be 10 spreadsheets in programm.
May be helpful DriveAPI.Files.Get request fileds

Related

Azure Storage - Download a blob with conditions

I've been having trouble assigning read conditions on blobs when trying to download them when using the Azure Storage SDK.
Basically, what I am trying to do goes like this:
Upload a blob (WORKS)
Download the blob (WORKS)
Get the Etag of the blob with blobRef.GetProperties().Etag (WORKS)
Use the Etag to try to download the blob again expecting a RequestFailedException e where the e.ErrorCode == ConditionNotMet (FAILS)
This is the code:
var condition = new Azure.Storage.Blobs.Models.BlobRequestConditions
{
IfNoneMatch = new Azure.ETag(previousEtagString),
};
//blobRef is a valid instance of Azure.Storage.Blobs.BlobClient
//target is the filePath
blobRef.DownloadTo(target, conditions: condition); // this should throw RequestFailedException
Notes:
When I try to compare the Etag fetched from step 3 (which is converted to a string) with the one I am sending in step 4, it returns they are the same blobRef.GetProperties().Etag == new Azure.Etag(blobRef.GetProperties().Etag.ToString()) -> true
I also opened the question at the GitHub Repo
Passing test cases with v11:
var blobRef = _blobContainer.GetBlockBlobReference(identifier);
blobRef.DownloadTo(target, AccessCondition.GenerateIfNoneMatchCondition(stringEtag)); //throws StorageException
Just noticed that the blob downloaded the second time, does not contain any data. It's empty. The first one has data. Are the read conditions partially working, but not throwing the exception?
This is definitely a bug. I opened the issue in the Github repo
Thanks
Use the Etag to try to download the blob again expecting a
RequestFailedException e where the e.ErrorCode == ConditionNotMet
This is not the expected behavior because Etag only changes when a blob is updated.
If a blob is not updated, Etag value remains the same. So if you do not update the blob, you can perform a conditional download multiple times without getting the precondition failure error.

Database (Excel) Access Speed - Using Open XML SDK in Visual Studio C# (DOM Approach)

I mostly write number-crunching programs using Visual Studio C# (2019) where I am simply taking input data, calculating results and displaying it. No complicated Network or Internet programming. Think first or second college level programming coarse from the early 1990's.
For inputs I was reading in data from an excel file using the following directive:
using Excel = Microsoft.Office.Interop.Excel;
This proved to be very slow when executing the program. I then learned this way of accessing an Excel file is no longer supported and has been superseded by Open XML SDK. Please see the following link to the Microsoft Dev Center page:
https://learn.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet
For what I want to do the Document Object Model(DOM) approach seems most appropriate for the thousands of individual excel cells I want to read as input data. However, the Microsoft Dev Center is certainly not the most user-friendly resource and the code example provided for reading an Excel file using this DOM approach is writing to a console which I'm not using. I never did get my code to work.
Long and short of it is, I got my code working using the GetCellValue Method:
https://learn.microsoft.com/en-us/office/open-xml/how-to-retrieve-the-values-of-cells-in-a-spreadsheet
However, this 'GetCellValue' method is still taking way too long. I need to read in thousands or tens of thousands of Excel input data cells in seconds or fractions of seconds not 20 seconds to a minute.
I think if I had an example of the DOM method reading in Excel data to an Array Variable (instead of writing to the console) it would help. Can anyone provide an example of such code?
Below I have included my code example where I modified the DOM approach code copied from the Microsoft Office Dev Center to write values from a source Excel File to a DataGrid instead of the Console used by the Dev Center code:
C#
// The DOM approach.
// Note that the code below works only for cells that contain numeric values.
//
public void ReadExcelFileDOM(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
DataGridView_Vessel.Rows.Clear();
DataGridView_Vessel.Refresh();
string text;
int File_Row = 0;
int File_Cell = 0;
foreach (Row r in sheetData.Elements<Row>())
{
DataGridView_Vessel.Rows.Add();
foreach (Cell c in r.Elements<Cell>())
{
if (c.CellValue == null)
{
File_Cell++;
//continue;
}
else
{
text = c.CellValue.Text;
if(File_Cell<12)
{
DataGridView_Vessel.Rows[File_Row].Cells[File_Cell].Value = text;
}
File_Cell++;
}
}
File_Row++;
}
//Console.WriteLine();
//Console.ReadKey();
}
}

Data import via Management API successful, but data for custom dimensions does not show

I am trying to import data for custom dimension in Google Analytics through the .NET client library. In Google Analytics, when I view the uploads for a data set from Admin > Data Import > Manage Uploads, it says my uploads are successful, but the data for the custom dimension doesn't seem to show up in my report. Right now, I am just using my custom dimension to set the category for an article.
Here is how I am uploading through the .Net client library.
string accountId = "***";
string webPropertyId = "***";
string customDataSourceId = "***";
string contentType = "application/octet-stream";
IUploadProgress progress;
using (var dataStream = CreateArticleCsvStream(articles))
{
var fs = File.Create("test.csv");
dataStream.CopyTo(fs);
fs.Close();
progress = service.Management.Uploads.UploadData(accountId, webPropertyId, customDataSourceId, dataStream, contentType).Upload();
}
if (progress.Status == UploadStatus.Failed)
{
throw progress.Exception;
}
Here is the output for test.csv
ga:pagePath,ga:dimension1
/path/to/page/,"MyCategory"
When I download the file from the data set, I get the same file as the test.csv file, it just has a random filename that gets assisgned to it.
I found this other question similar to mine, but there was no solution posted. Any help would be appreciated.
I have also waited over 24 hours, but still nothing.
It took a few days of trial and error but I finally found the solution.
First thing to check is that your Website's URL is correct under Admin > View Settings. We had ours set up like my.domain.com/path/to/site when it should have just been my.domain.com. (We are using SharePoint, which is why path/to/site was appended to the site URL)
Second thing to check is that your key/pagePath entries are all correct. In our case, we had an extra forward slash at the end of the URL. For some reason, Google Analytics displays the trailing forward slash in reports, but does not actually store it for the pagePath.
Another error may be capitalization. It seems like GA applies filters after the data has been processed. If you add the lowercase/uppercase filter, notice that it only affects how the URLs display in your reports. Behind the scenes, it seems that GA still stores the URL with whatever capitalization the hit initially came in with. For example if the URL on your site is my.domain.com/path/to/PAGE.aspx and you apply the lowercase filter, the pagePath will display in your reports as /path/to/page.aspx. But, if you use the lowercase value in your csv import, the data will not join. You must use the pagePath that appears on your site (/path/to/PAGE.aspx in this case).
It would be nice if Google gave some log files when it tries to process and join the uploaded data with the existing data, rather than just saying the upload was successful even though the processing/joining stage may fail.

PdfTextExtractor.GetTextFromPage suddenly giving empty string

We've been using the iTextSharp libraries for a couple of years now within an SSIS process to read some values out of a set of PDF exam documents. Everything has been running nicely until this week when suddenly we are getting the return of an empty string when calling the PdfTextExtractor.GetTextFromPage method. I'll include the code here:
// Read the data from the blob column where the PDF exists
byte[] byteBuffer = Row.FileData.GetBlobData(0, (int)Row.FileData.Length);
using (var pdfReader = new PdfReader(byteBuffer))
{
// Here is the important stuff
var extractStrategy = new LocationTextExtractionStrategy();
// This call will extract the page with the proper data on it depending on the exam type
// 1-page exams = NBOME - need to read first page for exam result data
// 2-page exams = NBME - need to read second page for exam result data
// The next two statements utilize this construct.
var vendor = pdfReader.NumberOfPages == 1 ? "NBOME" : "NBME";
*** THIS NEXT LINE GIVES THE EMPTY STRING
var newText = PdfTextExtractor.GetTextFromPage(pdfReader, pdfReader.NumberOfPages == 1 ? 1 : 2, extractStrategy);
var stringList = newText.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
var fileParser = FileParseFactory.GetFileParse(stringList, vendor);
// Populate our output variables
Row.ParsedExamName = fileParser.GetExamName(stringList);
Row.DateParsed = DateTime.Now;
Row.ParsedId = fileParser.GetStudentId(stringList);
Row.ParsedTestDate = fileParser.GetTestDate(stringList);
Row.ParsedTestDateString = fileParser.GetTestDateAsString(stringList);
Row.ParsedName = fileParser.GetStudentName(stringList);
Row.ParsedTotalScore = fileParser.GetTestScore(stringList);
Row.ParsedVendor = vendor;
}
This is not for all PDFs, by the way. To explain more, we are reading in exam files. One of the exam types (NBME) seems to be reading just fine. However, the other type (NBOME) is not. However, prior to this week, the NBOME ones were being read fine.
This leads me to think it is an internal format change of the PDF file itself.
Also, another bit of information is that the actual pdfReader has data - I can get a byte[] array of the data - but the call to get any text is simply giving me empty.
I'm sorry I'm not able to show any exam data or files - that information is sensitive.
Has anybody seen something like this? If so, any possible solutions?
Well - we have found our answer. The user was originally going to the NBOME web site and downloading the PDF exam result files to import into my parsing system. Like I said, this worked for quite some time. Recently (this week), however, the user started not downloading the files, but using a PDF printing feature and printed the PDF files as PDF. When she did that, the problem occurred.
Bottom line, it looks like the printing the PDF as PDF may have been injecting some characters or something under the covers that was causing the reading of the PDF via iTextSharp to not fail, but to give an empty string. She should have just continued downloading them directly.
Thanks to those who offered some comments!

Azure BLOB possible bug - Random wrong file

So, I know it is kinda crazy to report bug at this point in Azure life cycle, but I'm out of options. Here we go.
We have a service that you can upload files and a client that download then. That BLOB is stuffed with about 27 GB of data.
In a few occasions our users reported that some files were coming wrong, so we checked our MVC route to see if was anything wrong and found nothing.
So we created a simple console that loop the download:
public static void Main()
{
var firstHash = string.Empty;
var client = new System.Net.WebClient();
for (int i = 0; i < 5000; i++)
{
try
{
var date = DateTime.Now.ToString("HH-mm-ss-ffff");
var destination = #"C:\Users\Israel\Downloads\RO65\BLOB - RO65 -" + date + ".rfa";
client.DownloadFile("http://myboxfree.blob.core.windows.net/public/91fe9d90-71ce-4036-b711-a5300159abfa.rfa", destination);
string hash = string.Empty;
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(destination))
{
hash = Convert.ToBase64String(md5.ComputeHash(stream));
}
}
if (string.IsNullOrEmpty(firstHash))
firstHash = hash;
if (hash != firstHash) hash += " ---------------------------------------------";
Console.WriteLine("i: " + i.ToString() + " = " + hash);
}
catch { }
}
}
So here is the result - every now and then it downloads the wrong file:
The first 1000 downloads were OK, the right file. Out of the blue the BLOB returns a different file, and then goes back to normal.
The only relation I found between the files are the extension and the file size in bytes. The hash is (of course) different.
Any thoughts?
I have tried to rerun your sample code and wasn't able to repro.
Questions:
For the two different versions of the files you are seeing downloaded have you compared the contents of the two files? I think you said it was two completely different blobs being retrieved - however I wanted to verify that. How large is the delta between the two files?
Are you using RA-GRS and the client libraries read from secondary retry condition - meaning a network glitch could result in the read coming from the secondary region?
Suggestions:
Can you track the etag of the retrieved files. This allows you to check if the blob has changed since you first started reading it?
The Storage Service does enable you to explicitly validate the integrity of your objects to check to see if they have been modified in transit - potentially due to network issues etc. See Azure Storage Md5 Overview for more information. The simplest way however might just be to use https as these validations are already built into https.
Can you also try to repro using https and let me know if that helps?

Categories

Resources