Google Drive API / GUI exports invalid PDF - not readable by iText7

Google Drive API / GUI exports invalid PDF - not readable by iText7 - c#

I'm trying to export multiple Google Docs files via Google Drive API into Pdf and merge them into one using iText7 but it throws exception iText.IO.Exceptions.IOException: 'PDF header not found.' because of the weird PDF format from Google export.
Google Disk generated PDF content (read with notepad) is not valid PDF.
File content starts like this 倥䙄ㄭ㐮┊ㄊ instead of something like %PDF-1.4
The uploaded PDF file is readable from Google Disk without any problem and it is readable even if I export the Stream directly to the disk. File content is exactly the same when I download file manually through Google Docs GUI.
Here is my code to export files via API:
var mimeType = "application/pdf";
var file = GetFile(sourceFile);
var pdfRequest = _driveService.Files.Export(sourceFile, mimeType);
var stream = pdfRequest.ExecuteAsStream();
Then I'm uploading PDF back into Google Drive via it's API
var newFile = new Google.Apis.Drive.v3.Data.File();
newFile.MimeType = mimeType;
newFile.Parents = new List<string>() { targetFolder };
var createRequest = _driveService.Files.Create(newFile, stream, mimeType);
createRequest.SupportsAllDrives = true;
var createResult = createRequest.Upload();
Weirdly enough the format of exported PDF is ok when I use
var text = pdfRequest.Execute(); instead of pdfRequest.ExecuteAsStream (it starts with %PDF-1.7).
But Execute() returns string instead of Stream.
Is there any way to get standard PDF format from Google Disk API or convert it in any possible way?

The problem was in the iText7 itself. It considered PDF as invalid but it probably just does not support PDFs in iso8859_2 encoding.
I tried to use PDFSharp instead and everything went smoothly.
I've used ExecuteAsStream() from Google Disk API to get PDF Stream with no problems at all so it wasnt at fault.
Thanks for all your tips.

Related

How to find a text in the uploaded PDF file in ASP.NET c#

I want to find whether a text is present in the uploaded PDF file in ASP.NET c#.
using (MemoryStream str = new MemoryStream(this.docUploadField.FileBytes))
{
using (StreamReader sr = new StreamReader(str, Encoding.UTF8))
{
string line = sr.ReadToEnd();
}
}
I am getting the below as the file content when I read the contents of file.
Please help me with this

You surely need some PDF reading library.
Most famous being
IText (ITextSharp for who remembers it): https://github.com/itext/itext7-dotnet
PdfSharp: https://github.com/empira/PDFsharp
and many other free options.
With those you open pdf file and read it and take the text you need.
Usually they give you a collection of the PDF elements (paragraphs, images, etc etc, and you loop through them or use a search function to look for what you need)

How do i open jpg as txt or doc with google drive API V3

How do i open jpg as txt or doc with google drive API V3
I can go to google drive and upload a jpg then open uploaded jpg as doc and any text in image will be separated. I am trying to use google drive api v3 to replicate this operation (it appears like I have to use C#).
var fileMetadata = new Google.Apis.Drive.v3.Data.File()
{
Name = "image.jpg"
};
FilesResource.CreateMediaUpload request;
using (var stream = new System.IO.FileStream(#"\\cbslnas1\houshare\IT\OCR\image.jpg",
System.IO.FileMode.Open))
{
request = service.Files.Create(fileMetadata, stream, "image/jpeg");
request.Fields = "id";
request.Upload();
}
The above snippet after credential validation gets my image.jpg uploaded just fine.
However, I have not figured out what API instructions to use to get a google document from the jpg. Honestly, I would prefer to get it into a text/plain file.
The ultimate goal is to use google drive for a way to OCR a series of characters out of an image.
Any help would be appreciated.

Trying to convert docx file to another format(pdf) using Drive API

I was trying to convert .docx file to .pdf using drive api, which sounds reasonable since you can do it manually.
Here is my code:
FilesResource.CreateMediaUpload request;
using (var stream = new System.IO.FileStream(#"test.docx",
System.IO.FileMode.Open))
{
request = driveService.Files.Create(
fileMetadata, stream, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
request.Fields = ""id, webViewLink, webContentLink, size";
var x = request.Upload();
Console.WriteLine(x);
}
var file = request.ResponseBody;
Afterwards, I am getting id of this file and trying to do:
var downloadRequest = driveService.Files.Export(file.Id, "application/pdf");
which fails with error: "Export only supports Google Docs"
Ofc! I suppose it hasn't yet become "Google DOC", however, this format is supported for conversion as mentioned here and here.
Ok, I've noticed if you go to the drive and open the file manually it will become google doc file and also will get new ID. The export on this ID will work just fine. However, doing something manually isn't acceptable approach for our needs.
Tried another approach, you can use direct link with &export=pdf parameter to convert google doc file.
https://docs.google.com/document/d/FILE_ID/export?format=doc
But passing FILEID to that link doesn't work in this case(works with "DOC" file just fine) Tried doing something similiar to stackoverflow answer. No way.
So. Is there any way to trigger File to become Google DOC and wait till it converts? Is there any other way?
Thanks in advance!

Thanks to #bash.d I was able to convert from docx to pdf.
Actually one have to use v2 of API and its "Insert" method.
https://developers.google.com/drive/v2/reference/files/insert#examples
use the code from this link and specify
request.Convert = true;
after that I used
var downloadRequest = driveService.Files.Export(file.Id, "application/pdf");
and voilà! It worked! Takes about 30 seconds to convert file in my case.

How to use C# to display pdf in onenote using com/interop api

I'm new to stack overflow, C# and onenote interop com api. I'm trying to display a pdf file in onenote using C# and the onenote com/interop api (I'd rather not use the REST API).
I am able to display a link to a pdf file using the tag < InsertedFile pathSource="[myfilepath]" preferredName = "[myPreferredName]"> in conjunction with the UpdatePageContent function in the interop API, but this doesn't display the PDF.
I have been able to get my program to display an image in onenote using the following code to create the image tag
private XElement createImageTag(Image image)
{
string OneNoteNamespace = "http://schemas.microsoft.com/office/onenote/2013/onenote";
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.toBase64(image);
img.Add(data);
return img;
}
private string toBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
I tried altering this for a pdf instead of am image by converting a pdf to a byte array then converting it to base64 and assigning the result as data.Value in the createImageTag function but it did not result in a displayed pdf either (presumably because onenote was expecting an image and not a pdf). I'd like to avoid using third party libraries or extensions to convert a pdf to an image if possible, and haven't found any other ways to convert a pdf to an image.
I used ONOMSpy to look for any other onenote/xml tags that might help me display a pdf in onenote, but didn't see others besides the Image and InsertedFile tags that looked like they were close to doing what I wanted.
so if you could help me either :
1) find an easy way to convert a pdf to an image using C# or
2) show me how to tell onenote to display the PDF
I'd really appreciate it. Thanks!

Uploading a captured image to parse (Parse object)

I am trying to develop an android application using Xamarin(c#) and parse where user will capture a pic and upload it to parse. I know how to upload a text file but as I am new to this I have no idea how to deal with an image file can anyone please help. This is how I am uploading a text file
byte[] data = System.Text.Encoding.UTF8.GetBytes("This is content of the text file");
ParseFile file = new ParseFile("resume.txt", data);
await file.SaveAsync();
ParseObject gameScore = new ParseObject("GameScore");
gameScore["score"] = 0001;
gameScore["playerName"] = " Bob";
gameScore["e"] = file;
await gameScore.SaveAsync();`
Can anyone please help me with this problem.. thanks.

Parse has an entire section of docs devoted to dealing with files.
// File is in System.IO
byte[] data = File.ReadAllBytes(path_to_your_image);
ParseFile file = new ParseFile(name_of_your_file, data);
await file.SaveAsync();
// link your file object to your Parse object
gameScore["image"] = file;
Update:
The docs specifically say
It's important that you give a name to the file that has a file
extension. This lets Parse figure out the file type and handle it
accordingly. So, if you're storing PNG images, make sure your filename
ends with .png.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Google Drive API / GUI exports invalid PDF - not readable by iText7 - c#

Related

How to find a text in the uploaded PDF file in ASP.NET c#

How do i open jpg as txt or doc with google drive API V3

Trying to convert docx file to another format(pdf) using Drive API

How to use C# to display pdf in onenote using com/interop api

Uploading a captured image to parse (Parse object)

Categories

Resources