Reading Byte[] into Database from API - c#

I have been reading this as I would like to do this via LINQ. However, I have been unable to figure out how to read the data from the API.
When I output resource.Data.Body it says, Byte[].
When I output resource.Data.Size it says, 834234822. (or something like that)
I am trying to save the contents into my database like this:
newContent.ATTACHMENT = resource.Data.Body;
However, no data is ever loaded. I assume I have to loop through Body and store the contents in a variable, but I am not sure how.
Can someone help me connect the dots here?
Edit:
This is the source of the binary data I am trying to read http://dev.evernote.com/start/core/resources.php
Edit 2:
I am using the following code which gives me binary data and saves to database, but it must be corrupt, or something because when I go to open the file Windows photo viewer says it's corrupt or too large...
Resource resource = noteStore.getResource(authToken, attachment.Guid, true, false, true, true);
StringBuilder data = new StringBuilder();
foreach(byte b in resource.Data.Body)
{
data.Append(Convert.ToString(b, 2).PadLeft(8, '0'));
}
...
newContent.ATTACHMENT = System.Text.Encoding.ASCII.GetBytes(data.ToString());

Given that resource.Data.Body is byte[], and newContent.ATTACHMENT is System.Data.Linq.Binary, you should use the constructor on System.Data.Linq.Binary which takes an input parameter of type byte[]. http://msdn.microsoft.com/en-us/library/bb351422.aspx
newContent.ATTACHMENT = new System.Data.Linq.Binary(resource.Data.Body);

Related

NReco HTML-to-PDF Generator GeneratePdfFromFiles method throws exception

I have a fully working system for creating single page PDFs from HTML as below;
After initializing the converter
var nRecoHTMLToPDFConverter = new HtmlToPdfConverter();
nRecoHTMLToPDFConverter = PDFGenerator.PDFSettings(nRecoHTMLToPDFConverter);
string PDFContents;
PDFContents is an HTML string which is being populated.
The following command works perfectly and gives me the byte[] which I can return;
createDTO.PDFContent = nRecoHTMLToPDFConverter.GeneratePdf(PDFContents);
The problem arises when I want to test and develop the multi page functionality of the NReco library and change an arbitrary number of HTML pages to PDF pages.
var stringArray = new string[]
{
PDFContents, PDFContents,
};
var stream = new MemoryStream();
nRecoHTMLToPDFConverter.GeneratePdfFromFiles(stringArray, null, stream);
var mybyteArray = stream.ToArray();
the PDFContents are exactly the same as above. On paper, this should give me the byte array for 2 identical PDF pages however on call to GeneratePdfFromFiles method, I get the following exception;
WkHtmlToPdfException: Exit with code 1 due to network error: HostNotFoundError (exit code: 1)
Please help me resolve this if you have experience with this library and its complexities. I have a feeling that I'm not familiar with the proper use of a Stream object in this scenario. I've tested the working single page line and the malfunctioning multi page lines on the same method call so their context would be identical.
Many thanks
GeneratePdfFromFiles method you used expects array of file names (or URLs): https://www.nrecosite.com/doc/NReco.PdfGenerator/?topic=html/M_NReco_PdfGenerator_HtmlToPdfConverter_GeneratePdfFromFiles_1.htm
If you operate with HTML content as .NET strings you may simply save it to temp files, generate PDF and remove after that.

PdfTextExtractor.GetTextFromPage suddenly giving empty string

We've been using the iTextSharp libraries for a couple of years now within an SSIS process to read some values out of a set of PDF exam documents. Everything has been running nicely until this week when suddenly we are getting the return of an empty string when calling the PdfTextExtractor.GetTextFromPage method. I'll include the code here:
// Read the data from the blob column where the PDF exists
byte[] byteBuffer = Row.FileData.GetBlobData(0, (int)Row.FileData.Length);
using (var pdfReader = new PdfReader(byteBuffer))
{
// Here is the important stuff
var extractStrategy = new LocationTextExtractionStrategy();
// This call will extract the page with the proper data on it depending on the exam type
// 1-page exams = NBOME - need to read first page for exam result data
// 2-page exams = NBME - need to read second page for exam result data
// The next two statements utilize this construct.
var vendor = pdfReader.NumberOfPages == 1 ? "NBOME" : "NBME";
*** THIS NEXT LINE GIVES THE EMPTY STRING
var newText = PdfTextExtractor.GetTextFromPage(pdfReader, pdfReader.NumberOfPages == 1 ? 1 : 2, extractStrategy);
var stringList = newText.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
var fileParser = FileParseFactory.GetFileParse(stringList, vendor);
// Populate our output variables
Row.ParsedExamName = fileParser.GetExamName(stringList);
Row.DateParsed = DateTime.Now;
Row.ParsedId = fileParser.GetStudentId(stringList);
Row.ParsedTestDate = fileParser.GetTestDate(stringList);
Row.ParsedTestDateString = fileParser.GetTestDateAsString(stringList);
Row.ParsedName = fileParser.GetStudentName(stringList);
Row.ParsedTotalScore = fileParser.GetTestScore(stringList);
Row.ParsedVendor = vendor;
}
This is not for all PDFs, by the way. To explain more, we are reading in exam files. One of the exam types (NBME) seems to be reading just fine. However, the other type (NBOME) is not. However, prior to this week, the NBOME ones were being read fine.
This leads me to think it is an internal format change of the PDF file itself.
Also, another bit of information is that the actual pdfReader has data - I can get a byte[] array of the data - but the call to get any text is simply giving me empty.
I'm sorry I'm not able to show any exam data or files - that information is sensitive.
Has anybody seen something like this? If so, any possible solutions?
Well - we have found our answer. The user was originally going to the NBOME web site and downloading the PDF exam result files to import into my parsing system. Like I said, this worked for quite some time. Recently (this week), however, the user started not downloading the files, but using a PDF printing feature and printed the PDF files as PDF. When she did that, the problem occurred.
Bottom line, it looks like the printing the PDF as PDF may have been injecting some characters or something under the covers that was causing the reading of the PDF via iTextSharp to not fail, but to give an empty string. She should have just continued downloading them directly.
Thanks to those who offered some comments!

Decode Stream to CSV in Python by Byte (Translate from C# code)

I am trying to consume a streamed response in Python from a soap API, and output a CSV file. The response outputs a string coded in base 64, which I do not know what to do with. Also the api documentation says that the response must be read to a destination buffer-by-buffer.
Here is the C# code was provided by the api's documentation:
byte[] buffer = new byte[4000];
bool endOfStream = false;
int bytesRead = 0;
using (FileStream localFileStream = new FileStream(destinationPath, FileMode.Create, FileAccess.Write))
{
using (Stream remoteStream = client.DownloadFile(jobId))
{
while (!endOfStream)
{
bytesRead = remoteStream.Read(buffer, 0, buffer.Length);
if (bytesRead > 0)
{
localFileStream.Write(buffer, 0, bytesRead);
totalBytes += bytesRead;
}
else
{
endOfStream = true;
}
}
}
}
I have tried many different things to get this stream to a readable csv file, but non have worked.
with open('test.csv', 'w') as f: f.write(FileString)
Returns a csv with the base64 string spread over multiple lines
Here is my latest attempt:
with open('csvfile13.csv', 'wb') as csvfile:
FileString = client.service.DownloadFile(yyy.JobId, False)
stream = io.BytesIO(str(FileString))
with open(stream,"rt",4000) as readstream:
csvfile.write(readstream)
This produces the error:
TypeError: coercing to Unicode: need string or buffer, _io.BytesIO
Any help would be greatly appreciated, even if it is just to point me in the right direction. I will be ensure to award the points to whoever is the most helpful, even if I do not completely solve the issue!
I have asked several questions similar to this one, but I have yet to find an answer that works completely:
What is the Python equivalent to FileStream in C#?
Write Streamed Response(file-like object) to CSV file Byte by Byte in Python
How to replicate C# 'byte' and 'Write' in Python
Let me know if you need further clarification!
Update:
I have tried print(base64.b64decode(str(FileString)))
This gives me a page full of webdings like
]�P�O�J��Y��KW �
I have also tried
for data in client.service.DownloadFile(yyy.JobId, False):
print data
But this just loops through the output character by characater like any other string.
I have also managed to get a long string of bytes like \xbc\x97_D\xfb(not actual bytes, just similar format) by decoding the entire string, but I do not know how to make this readable.
Edit: Corrected the output of the sample python, added more example code, formatting
It sounds like you need to use the base64 module to decode the downloaded data.
It might be as simple as:
with open(destinationPath, 'w') as localFile:
remoteFile = client.service.DownloadFile(yyy.JobId, False)
remoteData = str(remoteFile).decode('base64')
localFile.write(remoteData)
I suggest you break the problem down and determine what data you have at each stage. For example what exactly are you getting back from client.service.DownloadFile?
Decoding your sample downloaded data (given in the comments):
'UEsYAItH7brgsgPutAG\AoAYYAYa='.decode('base64')
gives
'PK\x18\x00\x8bG\xed\xba\xe0\xb2\x03\xee\xb4\x01\x80\xa0\x06\x18\x01\x86'
This looks suspiciously like a ZIP file header. I suggest you rename the file .zip and open it as such to investigate.
If remoteData is a ZIP something like the following should extract and write your CSV.
import io
import zipfile
remoteFile = client.service.DownloadFile(yyy.JobId, False)
remoteData = str(remoteFile).decode('base64')
zipStream = io.BytesIO(remoteData)
z = zipfile.ZipFile(zipStream, 'r')
csvData = z.read(z.infolist()[0])
with open(destinationPath, 'w') as localFile:
localFile.write(csvData)
Note: BASE64 can have some variations regarding padding and alternate character mapping but once you can see the data it should be reasonably clear what you need. Of course carefully read the documentation on your SOAP interface.
Are you sure FileString is a Base64 string? Based on the source code here, suds.sax.text.Text is a subclass of Unicode. You can write this to a file as you would a normal string but whatever you use to read the data from the file may corrupt it unless it's UTF-8-encoded.
You can try writing your Text object to a UTF-8-encoded file using io.open:
import io
with io.open('/path/to/my/file.txt', 'w', encoding='utf_8') as f:
f.write(FileString)
Bear in mind, your console or text editor may have trouble displaying non-ASCII characters but that doesn't mean they're not encoded properly. Another way to inspect them is to open the file back up in the Python interactive shell:
import io
with io.open('/path/to/my/file.txt', 'r', encoding='utf_8') as f:
next(f) # displays the representation of the first line of the file as a Unicode object
In Python 3, you can even use the built-in csv to parse the file, however in Python 2, you'll need to pip install backports.csv because the built-in module doesn't work with Unicode objects:
from backports import csv
import io
with io.open('/path/to/my/file.txt', 'r', encoding='utf_8') as f:
r = csv.reader(f)
next(r) # displays the representation of the first line of the file as a list of Unicode objects (each value separated)

Convert HttpUploadedFileWrapper to something not in System.Web

I've implemented a file upload in ASP.Net MVC. I've created a view model that received the uploaded file as an HttpPostedFileWrapper and within the controller action I can save the file to disk.
However, I want to perform the actual save in service method which is in a class library that doesn't implement System.Web. I can't therefore pass the HttpPostedFileWrapper object to the service method.
Does anyone know how to achieve this, either by receiving the file as a different object or converting it to something else prior to passing it. The only way I can think of is to read the content of the file into a MemoryStream, and pass this along with the other parameters such as filename individually, but just wondered if there was a better way?
Thanks
The best approach would probably be retrieving the image data (as a byte[]) and the name of the image (as a string) and passing those along to your service, similar to the approach you mentioned :
public void UploadFile(HttpPostedFileWrapper file)
{
// Ensure a file is present
if(file != null)
{
// Store the file data
byte[] data = null;
// Read the file data into an array
using (var reader = new BinaryReader(file.InputStream))
{
data = reader.ReadBytes(file.ContentLength);
}
// Call your service here, passing along the data and file name
UploadFileViaService(file.FileName, data);
}
}
Since a byte[] and a string are very basic primatives, you should have no problem passing them to another service. A Stream is likely to work as well, but they can be prone to issues like being closed, whereas a byte[] will already have all of your content.

Convert the contents of a byte array to string

I have a byte array that contains the data of an uploaded file which happens to be a Resume of an employee(.doc file). I did it with the help of the following lines of code
AppSettingsReader rd = new AppSettingsReader();
FileUpload arr = (FileUpload)upresume;
Byte[] arrByte = null;
if (arr.HasFile && arr.PostedFile != null)
{
//To create a PostedFile
HttpPostedFile File = upresume.PostedFile;
//Create byte Array with file len
arrByte = new Byte[File.ContentLength];
//force the control to load data in array
File.InputStream.Read(arrByte, 0, File.ContentLength);
}
Now, I would like to get the contents of the uploaded file(resume) in string format either from the byte array or any other methods.
PS: 'contents' literally refers to the contents of the resume; for example if the resume(uploaded file) contains a word 'programming', I would like to have the same word contained in the string.
Please help me to solve this.
I worked on a similar project a few years ago. Long story short... I ended up reconstructing the file and saving it on the server, then programmatically convert it to pdf, and then index the contents of the pdf, this proved much easier in practice at the time.
Alternatively, if you can restrict resume uploads to docx file format, you can use Microsofts OpenXML library to parse and index the content very easily. But in practict this may cause usability issues for users of the web site.

Categories

Resources