Converting binary data to .doc file with the images inside it - c#

I have binary data stored in database which I need to convert them back for backup purposes. Most of them are .doc files with images attached in document. My method to restore them is to write binary data to byte string and write those bytes to the file like mydoc.doc. The problem is, it works for txt files and it actually works for text part of .doc file as well. Since most of the .doc files contain jpeg attached, after conversion I get some readable text and random characters which I believe are there for picture attached in doc file. Any help is appreciated. Thanks in advance...
Note: Binary data is stored in image data type in database. Database contains file path and name (which doesn't exist now) and corresponding binary data stored in image type, so from path I can detect the file type that it was before... some of them are .txt (which I was able to convert perfectly), some of them are .doc (which is problem because of attahcmens inside it)
Here is my code:
string s = "D0CF11E0A1B11AE100000000000000000000"; // note: string is for example
var bytes = GetBytesFromByteString(s).ToArray();
File.WriteAllBytes("C:\\temp\\test.doc", bytes);

A .doc file is not a string or even a text or ASCII. It is a raw binary file format.
So if your database cell contains a BLOB (Binary Large Object) simply treat it as an array of bytes and write it out to a (binary) file. No conversions, no encodings, nothing.
Edit
Whoever designed this database, they designed to store all kinds of files as an image (in the sense of memory-dump-image) i.e. a series of raw bytes in a cell of type image.
You should treat these bytes exactly as mentioned above: A series of raw bytes.

Related

How to decode MySql blob from backup file?

I got a MySql backup file containing a column like:
CREATE TABLE `person` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`profilepicture` blob,
)
The content of this row profilepicture is encoded in the backup file to some kind of dump, starting (in a text editor) like this:
INSERT INTO `tbox_account_transaction` VALUES (8,'?\?\?0JFIF\0\0\0\0\0\0[...]');
How can I decode this from the .sql backup file in c#? It is not Base64, but I do not recognize what else it might be. I don't want to restore the backup.sql file to MySql but instead decode the blob from the backup file.
The bytes you are seeing are almost certainly the initial bytes of a JPEG image. The readable string "JFIF" starting at the fifth byte is the clue (see https://en.wikipedia.org/wiki/JPEG_File_Interchange_Format).
Many of the other binary bytes are null or else non-printable characters. All the weird \? or \0 sequences are the client's best effort at representing these bytes on your text display.
The easiest way to access this data is to restore the dump file to a MySQL instance (even a local one running on a laptop), and then use SQL in your C# code to access the binary contents of the profilepicture blob.
Either save the data from one blob to a .jpg file as #Tommaso Belluzzo suggests, or else just display it directly, if you can. I'm not a C# programmer, but it seems there's a Bitmap class for this (see https://msdn.microsoft.com/en-us/library/8tda2c3c(v=vs.85).aspx).
It's outside the scope of a Stack Overflow answer to give a tutorial on how to write SQL in a C# application. There are plenty of resources available for that.
That looks like a jpg file header. Try the following (blob is a String typed variable containing the profilepicture):
using (MemoryStream stream = new MemoryStream(buffer))
{
Image image = Image.FromStream(stream, true, true);
image.Save("C:\\Test.jpg", ImageFormat.Jpeg);
}

Storing an SVG as bytes in DB?

Wee bit of background to set the scene : we've told a client he can provide us with images of any type and we'll put them on his reports. I've just had a quick run through of doing this and found that the reports (and other things between me and them) are all designed to only use SVGs.
I thought I'd struck gold when I found online that you can convert an image from a jpg or PNG into an SVG using free tools but alas I've not yet succeeded in getting an SVG stored as bytes in the DB in a format that allows me to use it again after reading it back out.
Here's a quick timeline of what followed leading up to my problem.
I use an online tool to generate an SVG from a PNG (e.g., MobileFish)
I can open and view it in Firefox and it looks ok
I ultimately need the SVG to be stored as bytes in the DB, from where the report will pull it via a webpage that serves it up as SVG. So I write it as bytes into a SQL data script. The code I use to write these bytes is included below.
I visit said webpage and I get an error that there is an "XML parsing error on Line 1 Column 1" and it shows some of my bytes. They begin "3C73"
I revisit the DB and compare the bytes I've just written there with some pre-existing (and working ones). While my new ones begin "3C73", the others begin "0xFFFE".
I think I've probably just pointed out something really fundamental but it hasn't clicked.
Can someone tell me what I've done that means my bytes aren't stored in the correct encoding/format?
When I open my new SVG in Notepad++ I can see the content includes the following which could be relevant :
<image width="900" height="401" xLink:href="data:image/png;base64,
(base 64 encoded data follows for 600+ lines)
Here's the brains of the code that turns my SVG into the bytes to be stored in DB :
var bytes = File.ReadAllBytes(file);
using (var fs = new StreamWriter(file + ".txt"))
{
foreach (var item in bytes)
{
fs.Write(String.Format("{0:X2}",item));
}
}
Any help appreciated.
Cheers
Two things:
SVGs are vector images, not bitmap files. All that online tool is doing is taking a JPEG and creating a SVG file with a JPEG embedded in it. You aren't really getting the benefit of a true SVG image. If you realise and understand that, then no worries.
SVG files are just text. In theory there is no reason you can't just store them as strings in your db. As long as the column is big enough. However normally if you are storing unstructured files in a db, the preferred column type to use is a "Blob".
http://technet.microsoft.com/en-us/library/bb895234.aspx
Converting your SVG file to hex is just making things slower and doubling the size of your files. Also when you convert back, you have to be very careful about the string encoding you are using. Which, in fact, sounds like the problem you are having.
I am suspecting you are doing it incorrectly. SVG is simply and XML based vector image format. I guess your application might be using SVG image element and you need to convert your png image to base64 encoded string .

Matplotlib savefig to BytesIO is slightly wrong?

I'm trying to save matplotlib figures to a memory stream, exactly as in another example on SO:
import matplotlib.pyplot as plt
import io
plt.figure()
plt.plot([1, 2])
plt.title("test")
buf = io.BytesIO()
plt.savefig(buf, format = 'png')
plt.savefig("real.png", format = 'png')
buf.seek(0)
data = buf.read()
buf.close()
f = open('copy.png', 'w')
f.write(data)
f.close()
I find that copy.png is slightly larger in size and applications refuse to open it. Is this some sort of encoding issue?
Background:
I'm trying to use python.net to render graphs with matplotlib and pass them out to C# for drawing. I want to avoid writing the images to disk. Ideally, I want to write to some sort of byte array that I can work with in C#.
Try opening the file in binary mode.
f = open('copy.png', 'wb')
From the documentation:
Python on Windows makes a distinction between text and binary files;
the end-of-line characters in text files are automatically altered
slightly when data is read or written. This behind-the-scenes
modification to file data is fine for ASCII text files, but it’ll
corrupt binary data like that in JPEG or EXE files. Be very careful to
use binary mode when reading and writing such files.

Why binary data writing's result is string?

I want to create a binary file and store string data in it, I used this sample:
FileStream fs = new FileStream("c:\\test.data", FileMode.Create);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(Encoding.ASCII.GetBytes("david stein"));
bw.Close();
but when I opened created file by this sample (test.data) in notepad, it has string data in it ("david stein"), now my question is that whats the difference between this binary writing and text writing when the result is string?
I'm looking to create a data in binary file until user can not open and read my data by note pad and if user open it in notepad see real binary data .
in some files when you open theme in text editors you can not read file content like jpg files contents,they do not use any encryption methods,what about it?how can i wite my data like this?
now my question is that whats the difference between this binary writing and text writing when the result is string?
The data in a file is always "a sequence of bytes". In this case, the sequence of bytes you've written is "the bytes representing the text 'david stein'" in the ASCII encoding. So yes, if you open the file in any editor which tries to interpret the bytes as text in a way which is compatible with ASCII, you'll see the text "david stein". Really it's just a load of bytes though - it all depends on how you interpret them.
If you'd written:
File.WriteAllText("c:\\test.data", "david stein", Encoding.ASCII);
you'd have ended up with the exact same sequence of bytes. There are any number of ways you could have created a file with the same bytes in. There's nothing about File.WriteAllText which "marks" the file as text, and there's nothing about FileStream or BinaryWriter which marks the file as binary.
EDIT: From comments:
I'm looking to create a data in binary file until user can not open and read my data by note pad
Well, there are lots of ways of doing that with different levels of security. Ideally, you'd want some sort of encryption - but then the code reading the data would need to be able to decrypt it as well, which means it would need to be able to get a key. That then moves the question to "how do I store a key".
If you only need to verify the data in the file (e.g. check that it matches something from elsewhere) then you could use a cryptographic hash instead.
If you only need to prevent the most casual of snoopers, you could use something which is basically just obfuscation - a very weak form of encryption with no "key" as such. Anyone who dceompiled your code would easily be able to get at the data in that case, but you may not care too much about that.
It all depends on your requirements.
All data is binary. A text file is binary data that happens to be a limited subset that represent valid characters, but it's still binary.
The way text editors typically differentiate a text file from a binary file is they scan a certain portion of the file for zero values, \0. These never exist in text-only files and almost always exist in binary files.

How to write a file format handler

Today i'm cutting video at work (yea me!), and I came across a strange video format, an MOD file format with an companion MOI file.
I found this article online from the wiki, and I wanted to write a file format handler, but I'm not sure how to begin.
I want to write a file format handler to read the information files, has anyone ever done this and how would I begin?
Edit:
Thanks for all the suggestions, I'm going to attempt this tonight, and I'll let you know. The MOI files are not very large, maybe 5KB in size at most (I don't have them in front of me).
You're in luck in that the MOI format at least spells out the file definition. All you need to do is read in the file and interpret the results based on the file definition.
Following the definition, you should be able to create a class that could read and interpret a file which returns all of the file format definitions as properties in their respective types.
Reading the file requires opening the file and generally reading it on a byte-by-byte progression, such as:
using(FileStream fs = File.OpenRead(path-to-your-file)) {
while(true) {
int b = fs.ReadByte();
if(b == -1) {
break;
}
//Interpret byte or bytes here....
}
}
Per the wiki article's referenced PDF, it looks like someone already reverse engineered the format. From the PDF, here's the first entry in the format:
Hex-Address: 0x00
Data Type: 2 Byte ASCII
Value (Hex): "V6"
Meaning: Version
So, a simplistic implementation could pull the first 2 bytes of data from the file stream and convert to ASCII, which would provide a property value for the Version.
Next entry in the format definition:
Hex-Address: 0x02
Data Type: 4 Byte Unsigned Integer
Value (Hex):
Meaning: Total size of MOI-file
Interpreting the next 4 bytes and converting to an unsigned int would provide a property value for the MOI file size.
Hope this helps.
If the files are very large and just need to be streamed in, I would create a new reader object that uses an unmanagedmemorystream to read the information in.
I've done a lot of different file format processing like this. More recently, I've taken to making a lot of my readers more functional where reading tends to use 'yield return' to return read only objects from the file.
However, it all depends on what you want to do. If you are trying to create a general purpose format for use in other applications or create an API, you probably want to conform to an existing standard. If however you just want to get data into your own application, you are free to do it however you want. You could use a binaryreader on the stream and construct the information you need within your app, or get the reader to return objects representing the contents of the file.
The one thing I would recommend. Make sure it implements IDisposable and you wrap it in a using!

Categories

Resources