Google Document AI c# mime Unsupported input file format

Google Document AI c# mime Unsupported input file format - c#

I am trying to upload a pdf for processing to google's Document AI service. Using google's using Google.Cloud.DocumentAI.V1 for "C#". Looked at the github and docs, not much info. PDF is on the local drive. I converted the pdf to a byte array then converted that to a Bystring. Then set the request mime to "application/pdf" but it return was an error of:
Status(StatusCode="InvalidArgument", Detail="Unsupported input file format.", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"#1627582435.256000000","description":"Error received from peer ipv4:142.250.72.170:443","file":"......\src\core\lib\surface\call.cc","file_line":1067,"grpc_message":"Unsupported input file format.","grpc_status":3}")
Code:
try
{
//Generate a document
string pdfFilePath = "C:\\Users\\maponte\\Documents\\Projects\\SettonProjects\\OCRSTUFF\\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
// Create client
DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();
// Initialize request argument(s)
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
InlineDocument = new Document(),
RawDocument = new RawDocument(),
};
request.RawDocument.MimeType = "application/pdf";
request.RawDocument.Content = content;
// Make the request
ProcessResponse response = await documentProcessorServiceClient.ProcessDocumentAsync(request);
Document docResponse = response.Document;
Console.WriteLine(docResponse.Text);
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
}

This is the problem (or at least one problem) - you aren't actually loading the file:
string pdfFilePath = "C:\\Users\\maponte\\Documents\\Projects\\SettonProjects\\OCRSTUFF\\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
You instead want:
string pdfFilePath = "path-as-before";
var bytes = File.ReadAllBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
I'd also note, however, that InlineDocument and RawDocument are alternatives to each other - specifying either of them removes the other. Your request creation would be better written as:
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
RawDocument = new RawDocument
{
MimeType = "application/pdf",
Content = content
}
};

Related

Google Document AI - Invalid argument

I am very new in google Document AI, I tried to use this code but still have this response. Do you have any idea what I'm doing wrong?
I have installed from nuget Google.Cloud.DocumentAI.V1
Status(StatusCode="InvalidArgument", Detail="Request contains an invalid argument.", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"#1643889903.765000000","description":"Error received from peer ipv4:142.250.186.42:443","file":"......\src\core\lib\surface\call.cc","file_line":1067,"grpc_message":"Request contains an invalid argument.","grpc_status":3}")
public async void Start()
{
Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", #"path-to-json");
try
{
//Generate a document
string pdfFilePath = #"path-to-invoice-pdf";
var bytes = File.ReadAllBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
// Create client
DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();
// Initialize request argument(s)
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("ProjectID", "eu", "ProcessorID"),
SkipHumanReview = false,
RawDocument = new RawDocument
{
MimeType = "application/pdf",
Content = content
}
};
// Make the request
ProcessResponse response = await documentProcessorServiceClient.ProcessDocumentAsync(request);
Document docResponse = response.Document;
Console.WriteLine(docResponse.Text);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}

Quoted from this doc:
Note that if you wish to use DocumentProcessorServiceClient other than in the US, you must specify the endpoint when you construct the client. The endpoint is of the form {location}-documentai.googleapis.com, e.g. eu-documentai.googleapis.com. The simplest way to specify the endpoint is to use DocumentProcessorServiceClientBuilder:
DocumentProcessorServiceClient client = new DocumentProcessorServiceClientBuilder
{
Endpoint = "eu-documentai.googleapis.com"
}.Build();

Firefox not downloading properly displayed blob PDF

My clients are unable to download PDF document when trying to save (CTRL+S) a properly displayed PDF document in Firefox (v. 49.0.2) browser.
I dont know if this is my programming problem or a browser problem.
Only way I can download is to click on "Download" button of the PDF plugin, but my clients want to save a file with (CTRL+S) option.
Please take a look at this picture:
And there is a angular code where I try to open a file in browser: it works on Chrome and Edge, it also opens a PDF in Firefox. Response object is a $http response.
function openFile(response) {
var responseHeaders = response.headers();
var contentType = responseHeaders['content-type'];
var contentDisposition = responseHeaders['content-disposition'];
var fileName = contentDisposition.match(/filename="(.+)"/)[1];
fileName = fileName.substring(0, fileName.indexOf(';')-1);
var file = new Blob([response.data], { type: contentType });
if(contentType==='application/pdf') //YES content-type is PDF
{
try
{
var fileURL = URL.createObjectURL(file);
window.open(fileURL);
}
catch(err) //For Edge, just save a file
{
FileSaver.saveAs(file, fileName);
}
}
else //for other content types, just save a file
{
FileSaver.saveAs(file, fileName);
}
}
And this is my C# backend code:
byte[] report = service.GetReportCustomerCreditRatesCard();//render report
RenderFormatResolver renderResolver = new RenderFormatResolver(request.renderFormat);
HttpContent content = new ByteArrayContent(report);
var response = new HttpResponseMessage(HttpStatusCode.OK);
response.Content = content;
response.Content.Headers.ContentType = new MediaTypeHeaderValue(renderResolver.MIMEType);
response.Content.Headers.ContentLength = report.Length;
response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment") //"attachment", "inline"
{
FileName = String.Format("{0}." + renderResolver.FileNameExtension,
Translations.REPORT_FILENAME_CUSTOMER_CARD),
Name = Translations.REPORT_FILENAME_CUSTOMER_CARD
};
return response;

Try once to change the content disposition header in your response object to hold also the file name:
Content-Disposition: attachment; filename="document.pdf"
So something like:
response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue( 'attachment; filename="document.pdf"' );
That might help. Not sure, but worth a try...

Download Excel File with WebAPI/MVC

I am trying to download Excel File through WebAPI. Basically Excel file is created through Memory Stream with the help of this Post
Excel Content is generating fine however I am unable to download the Excel as the Response itself is pure XML when I see it in Response Tab of Chrome Network Tools. Following is my code for C#
var sheet = linq.ExportToExcel(userAddedList);
var stream = new MemoryStream();
var sw = new StreamWriter(stream);
sw.Write(sheet);
sw.Flush();
var result = new HttpResponseMessage(HttpStatusCode.OK) { Content = new ByteArrayContent(stream.GetBuffer()) };
result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment") { FileName = "Report.xml" };
result.Content.Headers.ContentType = new MediaTypeHeaderValue("application/ms-excel");
var response = ResponseMessage(result);
return response;
And this is how I call it through Angular.
var httpRequest = commonFunctions.getRequestObject("GET", requestURL, {}, null);
$http(httpRequest).then(function (response) {
vm.isProcessing = false;
}, function (error) { displayError(error); });

If the browsers you are targeting support the File API you can use the Blob object. Here it is, wrapped in a function, taken from this fiddle:
var setFile = function( data, fileName, fileType ) {
// Set objects for file generation.
var blob, url, a, extension;
// Get time stamp for fileName.
var stamp = new Date().getTime();
// Set MIME type and encoding.
fileType = ( fileType || "text/csv;charset=UTF-8" );
extension = fileType.split( "/" )[1].split( ";" )[0];
// Set file name.
fileName = ( fileName || "ActiveVoice_" + stamp + "." + extension );
// Set data on blob.
blob = new Blob( [ data ], { type: fileType } );
// Set view.
if ( blob ) {
// Read blob.
url = window.URL.createObjectURL( blob );
// Create link.
a = document.createElement( "a" );
// Set link on DOM.
document.body.appendChild( a );
// Set link's visibility.
a.style = "display: none";
// Set href on link.
a.href = url;
// Set file name on link.
a.download = fileName;
// Trigger click of link.
a.click();
// Clear.
window.URL.revokeObjectURL( url );
} else {
// Handle error.
}
};
You would use it as part of your code like this:
$http(httpRequest).then(function (response) {
vm.isProcessing = false;
setFile(response.data, "Report.xls", "application/ms-excel");
}, function (error) { displayError(error); });

How to return a PDF from a Web API application

I have a Web API project that is running on a server. It is supposed to return PDFs from two different kinds of sources: an actual portable document file (PDF), and a base64 string stored in a database. The trouble I'm having is sending the document back to a client MVC application. The rest of this is the details on everything that's happened and that I've already tried.
I have written code that successfully translates those two formats into C# code and then (back) to PDF form. I have successfully transferred a byte array that was supposed to represent one of those documents, but I can't get it to display in browser (in a new tab by itself). I always get some kind of "cannot be displayed" error.
Recently, I made a way to view the documents on the server side to make sure I can at least get it to do that. It gets the document into the code and creates a FileStreamResult with it that it then returns as an (implicit cast) ActionResult. I got that to return to a server side MVC controller and threw it into a simple return (no view) that displays the PDF just fine in the browser. However, trying to simply go straight to the Web API function simply returns what looks like a JSON representation of a FileStreamResult.
When I try to get that to return properly to my client MVC application, it tells me that "_buffer" can't be directly set. Some error message to the effect that some of the properties being returned and thrown into an object are private and can't be accessed.
The aforementioned byte-array representation of the PDF, when translated to a base64 string, doesn't seem to have the same number of characters as the "_buffer" string returned in the JSON by a FileStreamResult. It's missing about 26k 'A's at the end.
Any ideas about how to get this PDF to return correctly? I can provide code if necessary, but there has to be some known way to return a PDF from a server-side Web API application to a client-side MVC application and display it as a web page in a browser.
P.S. I do know that the "client-side" application isn't technically on the client side. It will also be a server application, but that shouldn't matter in this case. Relative to the Web API server, my MVC application is "client-side".
Code
For getting pdf:
private System.Web.Mvc.ActionResult GetPDF()
{
int bufferSize = 100;
int startIndex = 0;
long retval;
byte[] buffer = new byte[bufferSize];
MemoryStream stream = new MemoryStream();
SqlCommand command;
SqlConnection sqlca;
SqlDataReader reader;
using (sqlca = new SqlConnection(CONNECTION_STRING))
{
command = new SqlCommand((LINQ_TO_GET_FILE).ToString(), sqlca);
sqlca.Open();
reader = command.ExecuteReader(CommandBehavior.SequentialAccess);
try
{
while (reader.Read())
{
do
{
retval = reader.GetBytes(0, startIndex, buffer, 0, bufferSize);
stream.Write(buffer, 0, bufferSize);
startIndex += bufferSize;
} while (retval == bufferSize);
}
}
finally
{
reader.Close();
sqlca.Close();
}
}
stream.Position = 0;
System.Web.Mvc.FileStreamResult fsr = new System.Web.Mvc.FileStreamResult(stream, "application/pdf");
return fsr;
}
API Function that gets from GetPDF:
[AcceptVerbs("GET","POST")]
public System.Web.Mvc.ActionResult getPdf()
{
System.Web.Mvc.ActionResult retVal = GetPDF();
return retVal;
}
For displaying PDF server-side:
public ActionResult getChart()
{
return new PDFController().GetPDF();
}
The code in the MVC application has changed a lot over time. The way it is right now, it doesn't get to the stage where it tries to display in browser. It gets an error before that.
public async Task<ActionResult> get_pdf(args,keys)
{
JObject jObj;
StringBuilder argumentsSB = new StringBuilder();
if (args.Length != 0)
{
argumentsSB.Append("?");
argumentsSB.Append(keys[0]);
argumentsSB.Append("=");
argumentsSB.Append(args[0]);
for (int i = 1; i < args.Length; i += 1)
{
argumentsSB.Append("&");
argumentsSB.Append(keys[i]);
argumentsSB.Append("=");
argumentsSB.Append(args[i]);
}
}
else
{
argumentsSB.Append("");
}
var arguments = argumentsSB.ToString();
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var response = await client.GetAsync(URL_OF_SERVER+"api/pdf/getPdf/" + arguments).ConfigureAwait(false);
jObj = (JObject)JsonConvert.DeserializeObject(response.Content.ReadAsStringAsync().Result);
}
return jObj.ToObject<ActionResult>();
}
The JSON I get from running the method directly from the Web API controller is:
{
"FileStream":{
"_buffer":"JVBER...NjdENEUxAA...AA==",
"_origin":0,
"_position":0,
"_length":45600,
"_capacity":65536,
"_expandable":true,
"_writable":true,
"_exposable":true,
"_isOpen":true,
"__identity":null},
"ContentType":"application/pdf",
"FileDownloadName":""
}
I shortened "_buffer" because it's ridiculously long.
I currently get the error message below on the return line of get_pdf(args,keys)
Exception Details: Newtonsoft.Json.JsonSerializationException: Could not create an instance of type System.Web.Mvc.ActionResult. Type is an interface or abstract class and cannot be instantiated. Path 'FileStream'.
Back when I used to get a blank pdf reader (the reader was blank. no file), I used this code:
public async Task<ActionResult> get_pdf(args,keys)
{
byte[] retArr;
StringBuilder argumentsSB = new StringBuilder();
if (args.Length != 0)
{
argumentsSB.Append("?");
argumentsSB.Append(keys[0]);
argumentsSB.Append("=");
argumentsSB.Append(args[0]);
for (int i = 1; i < args.Length; i += 1)
{
argumentsSB.Append("&");
argumentsSB.Append(keys[i]);
argumentsSB.Append("=");
argumentsSB.Append(args[i]);
}
}
else
{
argumentsSB.Append("");
}
var arguments = argumentsSB.ToString();
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/pdf"));
var response = await client.GetAsync(URL_OF_SERVER+"api/webservice/" + method + "/" + arguments).ConfigureAwait(false);
retArr = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
}
var x = retArr.Skip(1).Take(y.Length-2).ToArray();
/*Response.Clear();
Response.ClearContent();
Response.ClearHeaders();
Response.ContentType = "application/pdf";
Response.AppendHeader("Content-Disposition", "inline;filename=document.pdf");
Response.BufferOutput = true;
Response.BinaryWrite(x);
Response.Flush();
Response.End();*/
return new FileStreamResult(new MemoryStream(x),MediaTypeNames.Application.Pdf);
}
Commented out is code from some other attempts. When I was using that code, I was returning a byte array from the server. It looked like:
JVBER...NjdENEUx

Some Server side code to return PDF (Web Api).
[HttpGet]
[Route("documents/{docid}")]
public HttpResponseMessage Display(string docid) {
HttpResponseMessage response = Request.CreateResponse(HttpStatusCode.BadRequest);
var documents = reader.GetDocument(docid);
if (documents != null && documents.Length == 1) {
var document = documents[0];
docid = document.docid;
byte[] buffer = new byte[0];
//generate pdf document
MemoryStream memoryStream = new MemoryStream();
MyPDFGenerator.New().PrintToStream(document, memoryStream);
//get buffer
buffer = memoryStream.ToArray();
//content length for use in header
var contentLength = buffer.Length;
//200
//successful
var statuscode = HttpStatusCode.OK;
response = Request.CreateResponse(statuscode);
response.Content = new StreamContent(new MemoryStream(buffer));
response.Content.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
response.Content.Headers.ContentLength = contentLength;
ContentDispositionHeaderValue contentDisposition = null;
if (ContentDispositionHeaderValue.TryParse("inline; filename=" + document.Name + ".pdf", out contentDisposition)) {
response.Content.Headers.ContentDisposition = contentDisposition;
}
} else {
var statuscode = HttpStatusCode.NotFound;
var message = String.Format("Unable to find resource. Resource \"{0}\" may not exist.", docid);
var responseData = responseDataFactory.CreateWithOnlyMetadata(statuscode, message);
response = Request.CreateResponse((HttpStatusCode)responseData.meta.code, responseData);
}
return response;
}
On my a View you could do something like this
<a href="api/documents/1234" target = "_blank" class = "btn btn-success" >View document</a>
which will call the web api and open the PDF document in a new tab in the browser.
Here is how i basically do the same thing but from a MVC controller
// NOTE: Original return type: FileContentResult, Changed to ActionResult to allow for error results
[Route("{docid}/Label")]
public ActionResult Label(Guid docid) {
var timestamp = DateTime.Now;
var shipment = objectFactory.Create<Document>();
if (docid!= Guid.Empty) {
var documents = reader.GetDocuments(docid);
if (documents.Length > 0)
document = documents[0];
MemoryStream memoryStream = new MemoryStream();
var printer = MyPDFGenerator.New();
printer.PrintToStream(document, memoryStream);
Response.AppendHeader("Content-Disposition", "inline; filename=" + timestamp.ToString("yyyyMMddHHmmss") + ".pdf");
return File(memoryStream.ToArray(), "application/pdf");
} else {
return this.RedirectToAction(c => c.Details(id));
}
}
return this.RedirectToAction(c => c.Index(null, null));
}
Hope this helps

I needed to return a pdf file from a .NET core 3.1 web api, and found this excellent article:
https://codeburst.io/download-files-using-web-api-ae1d1025f0a9
Basically, you call:
var bytes = await System.IO.File.ReadAllBytesAsync(pathFileName);
return File(bytes, "application/pdf", Path.GetFileName(pathFileName));
Whole code is:
using Microsoft.AspNetCore.Mvc;
using System.IO;
using Reportman.Drawing;
using Reportman.Reporting;
using System.Text;
using System.Threading.Tasks;
[Route("api/[controller]")]
[ApiController]
public class PdfController : ControllerBase
{
[HttpGet]
[Route("ot/{idOT}")]
public async Task<ActionResult> OT(string idOT)
{
Report rp = new Report();
rp.LoadFromFile("ot-net.rep"); // File created with Reportman designer
rp.ConvertToDotNet();
// FixReport
rp.AsyncExecution = false;
PrintOutPDF printpdf = new PrintOutPDF();
// Perform the conversion from one encoding to the other.
byte[] unicodeBytes = Encoding.Convert(Encoding.ASCII, Encoding.Unicode, Encoding.ASCII.GetBytes($"Orden de trabajo {idOT}"));
string unicodeString = new string(Encoding.Unicode.GetChars(unicodeBytes));
// todo: convert to unicode
// e = Encoding.GetEncoding(unicodeString);
// System.Diagnostics.Trace.WriteLine(e);
if (rp.Params.Count > 0)
{
rp.Params[0].Value = unicodeString;
}
printpdf.FileName = $"ot-{idOT}.pdf";
printpdf.Compressed = false;
if (printpdf.Print(rp.MetaFile))
{
// Download Files using Web API. Changhui Xu. https://codeburst.io/download-files-using-web-api-ae1d1025f0a9
var bytes = await System.IO.File.ReadAllBytesAsync(printpdf.FileName);
return File(bytes, "application/pdf", Path.GetFileName(printpdf.FileName));
}
return null;
}
Call to this API looks like: https://localhost:44387/api/pdf/ot/7
Reportman is a pdf generator you can found at:
https://reportman.sourceforge.io/
Enjoy!

Dropbox API Unable to upload a file Issue while uploading

I use HigLabo.Net.Dropbox to upload a file to Dropbox. I created a App named synch and I am trying to upload a file. Below is my code
byte[] bytes = System.IO.File.ReadAllBytes(args[1]);
UploadFile(bytes,"sundas.jpg","/Apps/synch/");
public static void UploadFile(byte[] content, string filename, string target)
{
string App_key = "xxxxxxxxxxxxxxx";
string App_secret = "yyyyyyyyyyyyyy";
HigLabo.Net.OAuthClient ocl = null;
HigLabo.Net.AuthorizeInfo ai = null;
ocl = HigLabo.Net.Dropbox.DropboxClient.CreateOAuthClient(App_key, App_secret);
ai = ocl.GetAuthorizeInfo();
string RequestToken= ai.RequestToken;
string RequestTokenSecret= ai.RequestTokenSecret;
string redirect_url = ai.AuthorizeUrl;
AccessTokenInfo t = ocl.GetAccessToken(RequestToken, RequestTokenSecret);
string Token= t.Token;
string TokenSecret= t.TokenSecret;
DropboxClient cl = new DropboxClient(App_key, App_secret, Token, TokenSecret);
HigLabo.Net.Dropbox.UploadFileCommand ul = new HigLabo.Net.Dropbox.UploadFileCommand();
ul.Root = RootFolder.Sandbox;
Console.WriteLine(ul.Root);
ul.FolderPath = target;
ul.FileName = filename;
ul.LoadFileData(content);
Metadata md = cl.UploadFile(ul);
Console.WriteLine("END");
}
The code executes fine but the file is not getting upload in Dropbox.
Am I missing something? Is the path to upload correct? How do I view the file in Dropbox whether it is uploaded or not?
Is there a setting which I am missing while creating the app? I am just looking at the home page and I am expecting the file at the root folder. Am I correct?
Or do I need to look into some other location?

Thanks #smarx and
#Greg.
The below is the code to accomplish the task. Thanks again for your support, I hope this will be helpful for some one.
string filePath="C:\\Tim\\sundar.jpg";
RestClient client = new RestClient("https://api-content.dropbox.com/1/");
IRestRequest request = new RestRequest("files_put/auto/{path}", Method.PUT);
FileInfo fileInfo = new FileInfo(filePath);
long fileLength = fileInfo.Length;
request.AddHeader("Authorization", "Bearer FTXXXXXXXXXXXXXXXXXXXisqFXXXXXXXXXXXXXXXXXXXXXXXXXXXX");
request.AddHeader("Content-Length", fileLength.ToString());
request.AddUrlSegment("path", string.Format("Public/{0}", fileInfo.Name));
byte[] data = File.ReadAllBytes(filePath);
var body = new Parameter
{
Name = "file",
Value = data,
Type = ParameterType.RequestBody,
};
request.Parameters.Add(body);
IRestResponse response = client.Execute(request);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Google Document AI c# mime Unsupported input file format - c#

Related

Google Document AI - Invalid argument

Firefox not downloading properly displayed blob PDF

Download Excel File with WebAPI/MVC

How to return a PDF from a Web API application

Dropbox API Unable to upload a file Issue while uploading

Categories

Resources