I have a byte array that contains the data of an uploaded file which happens to be a Resume of an employee(.doc file). I did it with the help of the following lines of code
AppSettingsReader rd = new AppSettingsReader();
FileUpload arr = (FileUpload)upresume;
Byte[] arrByte = null;
if (arr.HasFile && arr.PostedFile != null)
{
//To create a PostedFile
HttpPostedFile File = upresume.PostedFile;
//Create byte Array with file len
arrByte = new Byte[File.ContentLength];
//force the control to load data in array
File.InputStream.Read(arrByte, 0, File.ContentLength);
}
Now, I would like to get the contents of the uploaded file(resume) in string format either from the byte array or any other methods.
PS: 'contents' literally refers to the contents of the resume; for example if the resume(uploaded file) contains a word 'programming', I would like to have the same word contained in the string.
Please help me to solve this.
I worked on a similar project a few years ago. Long story short... I ended up reconstructing the file and saving it on the server, then programmatically convert it to pdf, and then index the contents of the pdf, this proved much easier in practice at the time.
Alternatively, if you can restrict resume uploads to docx file format, you can use Microsofts OpenXML library to parse and index the content very easily. But in practict this may cause usability issues for users of the web site.
Related
I am trying to consume a streamed response in Python from a soap API, and output a CSV file. The response outputs a string coded in base 64, which I do not know what to do with. Also the api documentation says that the response must be read to a destination buffer-by-buffer.
Here is the C# code was provided by the api's documentation:
byte[] buffer = new byte[4000];
bool endOfStream = false;
int bytesRead = 0;
using (FileStream localFileStream = new FileStream(destinationPath, FileMode.Create, FileAccess.Write))
{
using (Stream remoteStream = client.DownloadFile(jobId))
{
while (!endOfStream)
{
bytesRead = remoteStream.Read(buffer, 0, buffer.Length);
if (bytesRead > 0)
{
localFileStream.Write(buffer, 0, bytesRead);
totalBytes += bytesRead;
}
else
{
endOfStream = true;
}
}
}
}
I have tried many different things to get this stream to a readable csv file, but non have worked.
with open('test.csv', 'w') as f: f.write(FileString)
Returns a csv with the base64 string spread over multiple lines
Here is my latest attempt:
with open('csvfile13.csv', 'wb') as csvfile:
FileString = client.service.DownloadFile(yyy.JobId, False)
stream = io.BytesIO(str(FileString))
with open(stream,"rt",4000) as readstream:
csvfile.write(readstream)
This produces the error:
TypeError: coercing to Unicode: need string or buffer, _io.BytesIO
Any help would be greatly appreciated, even if it is just to point me in the right direction. I will be ensure to award the points to whoever is the most helpful, even if I do not completely solve the issue!
I have asked several questions similar to this one, but I have yet to find an answer that works completely:
What is the Python equivalent to FileStream in C#?
Write Streamed Response(file-like object) to CSV file Byte by Byte in Python
How to replicate C# 'byte' and 'Write' in Python
Let me know if you need further clarification!
Update:
I have tried print(base64.b64decode(str(FileString)))
This gives me a page full of webdings like
]�P�O�J��Y��KW �
I have also tried
for data in client.service.DownloadFile(yyy.JobId, False):
print data
But this just loops through the output character by characater like any other string.
I have also managed to get a long string of bytes like \xbc\x97_D\xfb(not actual bytes, just similar format) by decoding the entire string, but I do not know how to make this readable.
Edit: Corrected the output of the sample python, added more example code, formatting
It sounds like you need to use the base64 module to decode the downloaded data.
It might be as simple as:
with open(destinationPath, 'w') as localFile:
remoteFile = client.service.DownloadFile(yyy.JobId, False)
remoteData = str(remoteFile).decode('base64')
localFile.write(remoteData)
I suggest you break the problem down and determine what data you have at each stage. For example what exactly are you getting back from client.service.DownloadFile?
Decoding your sample downloaded data (given in the comments):
'UEsYAItH7brgsgPutAG\AoAYYAYa='.decode('base64')
gives
'PK\x18\x00\x8bG\xed\xba\xe0\xb2\x03\xee\xb4\x01\x80\xa0\x06\x18\x01\x86'
This looks suspiciously like a ZIP file header. I suggest you rename the file .zip and open it as such to investigate.
If remoteData is a ZIP something like the following should extract and write your CSV.
import io
import zipfile
remoteFile = client.service.DownloadFile(yyy.JobId, False)
remoteData = str(remoteFile).decode('base64')
zipStream = io.BytesIO(remoteData)
z = zipfile.ZipFile(zipStream, 'r')
csvData = z.read(z.infolist()[0])
with open(destinationPath, 'w') as localFile:
localFile.write(csvData)
Note: BASE64 can have some variations regarding padding and alternate character mapping but once you can see the data it should be reasonably clear what you need. Of course carefully read the documentation on your SOAP interface.
Are you sure FileString is a Base64 string? Based on the source code here, suds.sax.text.Text is a subclass of Unicode. You can write this to a file as you would a normal string but whatever you use to read the data from the file may corrupt it unless it's UTF-8-encoded.
You can try writing your Text object to a UTF-8-encoded file using io.open:
import io
with io.open('/path/to/my/file.txt', 'w', encoding='utf_8') as f:
f.write(FileString)
Bear in mind, your console or text editor may have trouble displaying non-ASCII characters but that doesn't mean they're not encoded properly. Another way to inspect them is to open the file back up in the Python interactive shell:
import io
with io.open('/path/to/my/file.txt', 'r', encoding='utf_8') as f:
next(f) # displays the representation of the first line of the file as a Unicode object
In Python 3, you can even use the built-in csv to parse the file, however in Python 2, you'll need to pip install backports.csv because the built-in module doesn't work with Unicode objects:
from backports import csv
import io
with io.open('/path/to/my/file.txt', 'r', encoding='utf_8') as f:
r = csv.reader(f)
next(r) # displays the representation of the first line of the file as a list of Unicode objects (each value separated)
My program needs to read any type of file from a given directory path and it has to write that information into a byte array.
string combine = Path.Combine(precombine, filename);
string content = System.IO.File.ReadAllText(combine);
This way I can read a text file however I have to read all kind of files such as music or image and write them into a byte array.
Use the File.ReadAllBytes method
byte[] fileContent = System.IO.File.ReadAllBytes(combine);
In Microsoft CRM we have an attachment that should be fetched and downloaded. So I have a byte array that represents the fetched file:
byte[] fileContent = Convert.FromBase64String(query.DocumentBody);
If I use this code, of course it can be downloaded but the file path should be hardcoded (like C:/<folder name>/) and I don't want it like that.
using (FileStream fileStream = new FileStream(path + query.FileName, FileMode.OpenOrCreate))
{
byte[] fileContent = Convert.FromBase64String(query.DocumentBody);
fileStream.Write(fileContent, 0, fileContent.Length);
//Response.OutputStream.WriteByte(fileContent);
}
How can I download the file from a byte array? I've tried searching for ways but it all needs a file path, and I can't provide that file path since the object is a byte array.
I'm not sure what exactly is your problem, but following should write byte array to output stream. You may need "content-disposition" header for file name and "content-type" to let browser offer "download" instead of trying to open directly:
Response.OutputStream..Write(fileContent , 0, fileContent .Length);
I have the whole MS Word file itself saved into a byte array.A want to load it the way I would if it was on file system but with the minimal use of Microsoft.Office.Interop.Word because it is very slow when it gets the the .Open(args[]) part.
Try this....
byte[] bte = File.ReadAllBytes("E:\\test.doc"); // Put the Reading file
File.WriteAllBytes(#"E:\\test1.doc", bte); // Same contents you will get in byte[] and that will be save here
There is no supported way to do it right off-the-bat using Interop.Word, as there are no methods supporting byte arrays.
As a viable workaround you can use a temporary file in the following way:
// byte[] fileBytes = getFileBytesFromDB();
var tmpFile = Path.GetTempFileName();
File.WriteAllBytes(tmpFile, fileBytes);
Application app = new word.Application();
Document doc = app.Documents.Open(filePath);
// .. do your stuff here ...
doc.Close();
app.Quit();
byte[] newFileBytes = File.ReadAllBytes(tmpFile);
File.Delete(tmpFile);
Fore additional info, read this post on my blog.
The method public static byte[] ReadAllBytes(
string path
) returns all the file information into a byte array. You dont have to worry about the stream, as the MSDN documentation says:
"Given a file path, this method opens the file, reads the contents of the file into a byte array, and then closes the file."
Check out this link if you want more information
I am building some C# desktop application and I need to save file into database. I have come up with some file chooser which give me correct path of the file. Now I have question how to save that file into database by using its path.
It really depends on the type and size of the file. If it's a text file, then you could use File.ReadAllText() to get a string that you can save in your database.
If it's not a text file, then you could use File.ReadAllBytes() to get the file's binary data, and then save that to your database.
Be careful though, databases are not a great way to store heavy files (you'll run into some performance issues).
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
int numBytes = new FileInfo(fileName).Length;
byte[] buff = br.ReadBytes(numBytes);
Then you upload it to the DB like anything else, I'm assume you are using a varbinary column (BLOB)
So filestream would be it but since you're using SQL 2K5 you will have to do it the read into memory way; which consumes alot of resources.
First of the column type varchar(max) is your friend this give you ~2Gb of data to play with, which is pretty big for most uses.
Next read the data into a byte array and convert it to a Base64String
FileInfo _fileInfo = new FileInfo(openFileDialog1.FileName);
if (_fileInfo.Length < 2147483647) //2147483647 - is the max size of the data 1.9gb
{
byte[] _fileData = new byte[_fileInfo.Length];
_fileInfo.OpenRead().Read(_fileData, 0, (int)_fileInfo.Length);
string _data = Convert.ToBase64String(_fileData);
}
else
{
MessageBox.Show("File is too large for database.");
}
And reverse the process to recover
byte[] _fileData = Convert.FromBase64String(_data);
You'll want to dispose of those strings as quickly as possible by setting them to string.empty as soon as you have finished using them!
But if you can, just upgrade to 2008 and use FILESTREAM.
If you're using SQL Server 2008, you could use FILESTREAM (getting started guide here). An example of using this functionality from C# is here.
You would need the file into a byte array then store this as a blob field in the database possible with the name you wanted to give the file and the file type.
You could just reverse the process for putting the file out again.