Storing an SVG as bytes in DB?

Storing an SVG as bytes in DB? - c#

Wee bit of background to set the scene : we've told a client he can provide us with images of any type and we'll put them on his reports. I've just had a quick run through of doing this and found that the reports (and other things between me and them) are all designed to only use SVGs.
I thought I'd struck gold when I found online that you can convert an image from a jpg or PNG into an SVG using free tools but alas I've not yet succeeded in getting an SVG stored as bytes in the DB in a format that allows me to use it again after reading it back out.
Here's a quick timeline of what followed leading up to my problem.
I use an online tool to generate an SVG from a PNG (e.g., MobileFish)
I can open and view it in Firefox and it looks ok
I ultimately need the SVG to be stored as bytes in the DB, from where the report will pull it via a webpage that serves it up as SVG. So I write it as bytes into a SQL data script. The code I use to write these bytes is included below.
I visit said webpage and I get an error that there is an "XML parsing error on Line 1 Column 1" and it shows some of my bytes. They begin "3C73"
I revisit the DB and compare the bytes I've just written there with some pre-existing (and working ones). While my new ones begin "3C73", the others begin "0xFFFE".
I think I've probably just pointed out something really fundamental but it hasn't clicked.
Can someone tell me what I've done that means my bytes aren't stored in the correct encoding/format?
When I open my new SVG in Notepad++ I can see the content includes the following which could be relevant :
<image width="900" height="401" xLink:href="data:image/png;base64,
(base 64 encoded data follows for 600+ lines)
Here's the brains of the code that turns my SVG into the bytes to be stored in DB :
var bytes = File.ReadAllBytes(file);
using (var fs = new StreamWriter(file + ".txt"))
{
foreach (var item in bytes)
{
fs.Write(String.Format("{0:X2}",item));
}
}
Any help appreciated.
Cheers

Two things:
SVGs are vector images, not bitmap files. All that online tool is doing is taking a JPEG and creating a SVG file with a JPEG embedded in it. You aren't really getting the benefit of a true SVG image. If you realise and understand that, then no worries.
SVG files are just text. In theory there is no reason you can't just store them as strings in your db. As long as the column is big enough. However normally if you are storing unstructured files in a db, the preferred column type to use is a "Blob".
http://technet.microsoft.com/en-us/library/bb895234.aspx
Converting your SVG file to hex is just making things slower and doubling the size of your files. Also when you convert back, you have to be very careful about the string encoding you are using. Which, in fact, sounds like the problem you are having.

I am suspecting you are doing it incorrectly. SVG is simply and XML based vector image format. I guess your application might be using SVG image element and you need to convert your png image to base64 encoded string .

Related

C# PdfImage LibTiff iTextSharp G3 / G4 compression

I have a service that takes a pdf document, resizes all the images, and replaces it in the pdf. The problem that I'm getting at, is the compression.
Some documents are scanned and saved with a Compression.CCITTFAX3 compression and some are saved with a Compression.CCITTFAX4 compression. I am using iTextSharp and convert the stream bytes to a Tiff, otherwise the image becomes funky because of stride or something.
Below is the code I'm currently making use of to check for the correct filter, and then convert to tiff image.
if (filter == "/CCITTFaxDecode")
{
byte[] data = PdfReader.GetStreamBytesRaw((PRStream)stream);
using (MemoryStream ms = new MemoryStream())
{
using (Tiff myTiff = Tiff.ClientOpen("in-memory", "w", ms, new TiffStream()))
{
myTiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(dict.Get(PdfName.WIDTH).ToString()));
myTiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(dict.Get(PdfName.HEIGHT).ToString()));
myTiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX3);
myTiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(dict.Get(PdfName.BITSPERCOMPONENT).ToString()));
myTiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
myTiff.WriteRawStrip(0, data, data.Length);
myTiff.Flush();
using (System.Drawing.Image img = new Bitmap(ms))
{
if (img == null) continue;
ReduceResolution(stream, img, quality);
}
myTiff.Close();
}
}
}
Just to make sure that you understand my question...
I want to find out how I know when to use G3 compression and when to use G4 compression.
Keep in mind that I've tried every code sample I could find.
This is quite important, as we interface with banking systems, and the files uploaded are sent to them as FICA documents.
Please help...

You need to go low level and inspect the image dictionary. The /DecodeParms entry is a dictionary that contains several keys related to CCITT compression. The /K key specifies the compression type: -1 is G4, 0 is G3 1D and 1 is G3 2D.
Update: to be more exact a negative value, usually -1, is G4, 0 is G3 1D and a positive value, usually 1, is G3 2D. To answer your question in the comment, the /K entry is optional and if it is missing the default value is considered to be 0.

I would not advise inserting the data direct. I base this assertion on many years of practical experience of PDFs and TIFF in products like ABCpdf .NET (on which I work).
While in theory you should be able to move the data over direct, minor differences between the formats of the compressed data are likely to lead to occasional mismatches.
The fact that some Fax TIFFs contain data which will display correctly in a TIFF viewer but not in a PDF one leads me to suspect that the same kind of problem is likely to operate in the other direction too.
I'm not going to say this kind of problem is common but it is the kind of thing I wouldn't rely on if I was in a bank. Unless you are very sure your data source will be uniform I would suggest it is much safer to decompress and recompress.
I would also note that sometimes images are held inline in the content stream rather than in a separate XObject. Again this is something you will need to cope with unless your data source produces a standard format which you can be sure will not contain this kind of structure.

Thank you for the replies above. The solution from Mihai seems viable if you do have all the information from the stream. I found that iTextSharp does not do this properly, so I ended up buying pdf4net. Much simpler than trying to figure out whats the better solution, besides, it ended up cheaper than my time I spent on this.
OnceUponATime.... Thank you for the information given above.
PDF4Net has a built in method that you get all the images per page... This sorted my issues, whereas I tried to do this myself using iTextSharp and the examples that were given to me.

How to write a file format handler

Today i'm cutting video at work (yea me!), and I came across a strange video format, an MOD file format with an companion MOI file.
I found this article online from the wiki, and I wanted to write a file format handler, but I'm not sure how to begin.
I want to write a file format handler to read the information files, has anyone ever done this and how would I begin?
Edit:
Thanks for all the suggestions, I'm going to attempt this tonight, and I'll let you know. The MOI files are not very large, maybe 5KB in size at most (I don't have them in front of me).

You're in luck in that the MOI format at least spells out the file definition. All you need to do is read in the file and interpret the results based on the file definition.
Following the definition, you should be able to create a class that could read and interpret a file which returns all of the file format definitions as properties in their respective types.
Reading the file requires opening the file and generally reading it on a byte-by-byte progression, such as:
using(FileStream fs = File.OpenRead(path-to-your-file)) {
while(true) {
int b = fs.ReadByte();
if(b == -1) {
break;
}
//Interpret byte or bytes here....
}
}
Per the wiki article's referenced PDF, it looks like someone already reverse engineered the format. From the PDF, here's the first entry in the format:
Hex-Address: 0x00
Data Type: 2 Byte ASCII
Value (Hex): "V6"
Meaning: Version
So, a simplistic implementation could pull the first 2 bytes of data from the file stream and convert to ASCII, which would provide a property value for the Version.
Next entry in the format definition:
Hex-Address: 0x02
Data Type: 4 Byte Unsigned Integer
Value (Hex):
Meaning: Total size of MOI-file
Interpreting the next 4 bytes and converting to an unsigned int would provide a property value for the MOI file size.
Hope this helps.

If the files are very large and just need to be streamed in, I would create a new reader object that uses an unmanagedmemorystream to read the information in.
I've done a lot of different file format processing like this. More recently, I've taken to making a lot of my readers more functional where reading tends to use 'yield return' to return read only objects from the file.
However, it all depends on what you want to do. If you are trying to create a general purpose format for use in other applications or create an API, you probably want to conform to an existing standard. If however you just want to get data into your own application, you are free to do it however you want. You could use a binaryreader on the stream and construct the information you need within your app, or get the reader to return objects representing the contents of the file.
The one thing I would recommend. Make sure it implements IDisposable and you wrap it in a using!

Reading image from Access - parameter not valid

I have simple database in Access .mdb file, but I don't know how to deal with: "parameter not valid" exception when Im creating Image from stream.
I'v read that I need to strip 78 bytes offset (from here) but I still get a "parameter not valid" error
when I call FromStream, even after stripping off the first 78 bytes.
This doesn't work for me:
byte[] abytPic = (byte[])dt.Rows[0]["Photo"]; byte arrary with image
if ((abytPic[0] == 21) && (abytPic[1] == 28)) //It's true
{
byte[] abytStripped = new byte[abytPic.Length - 78];
System.Buffer.BlockCopy(abytPic, 78, abytStripped, 0, abytPic.Length - 78);
msPic = new emoryStream(abytStripped);
}

If you are reading the data directly from MS Access, you do not need to strip any header information.
Assuming the image is stored as a BLOB, which is the most common, here is code to read in the array of bytes from the database and store as an image file (sorry, VB instead of C#):
Dim varBytes() As Byte
Using cn As New OleDbConnection(myConnectionString)
cn.Open()
sqlText = "SELECT [myColumn] " _
& "FROM [myTable] " _
& "WHERE ([mySearchCriteria] = '" & mySearchTerm & "')"
Using cm As New OleDbCommand(sqlText, cn)
Dim rdr As OleDbDataReader
rdr = cm.ExecuteReader
rdr.Read()
varBytes = rdr.GetValue(0)
End Using
End Using
My.Computer.FileSystem.WriteAllBytes(myPath & "\myFile.emf", varBytes, True)
This example that I had laying around is one where I knew the files in the database were .emf images. If you know the extension, you can put it on the file name. If you don't, you can leave it blank and then open the resulting with an image viewer; it should start. If you need to find the extension or file type, once it is saved as a file, you can open it with any hex editor and the file type will be available from the header information.
Your question is a little bit unclear, so I'm not sure the above code is exactly what you want, but it should get you a lot closer.
EDIT:
This is the VB code that takes the array of bytes, loads it into a MemoryStream object, and then creates an Image object from the Stream. This bit of code worked just fine, and displayed the image in a picturebox on my form.
Dim img As Image
Dim str As New MemoryStream(varBytes)
img = Image.FromStream(str)
PictureBox1.Image = img
If the C# equivalent of this is not working for you, then the problem is likely in how the image is stored in the MS Access database.
EDIT:
If the image in your database is stored as a 'Package' and not a 'Long binary data', then you will need to strip the header information that MS Access adds. I've been playing with the 'Package' type of image storage with a simple .jpg file. The header in this case is much longer than 78 bytes. In this instance, it's actually 234 bytes, and MS Access also added some information to the end of the original file; about 292 bytes in this case.
It looks like your original approach was correct, you will just need to determine how many bytes to strip off the front and rear of the Byte array for your situation.
I determined it for my file by comparing the original image file, and the file exported from the database, (not to a Stream object, see my first code) in a hex editor. Once I figured out how much information (header and footer) was added by MS Access, I then knew how many bytes needed to be stripped.
EDIT:
The size of the header added by MS Access when the image is stored as 'Package' varies depending on the file type, and the original location (full path information) of the image when it was dumped into the MS Access database. So, even for the same file type, you may have a different number of bytes to strip from the header for each file.
This makes it a lot more difficult, because then you will have to scan the byte array until you find the normal start-of-file information for that file type, and then strip everything before it.
All this headache is one of the reasons that it is better to store images as BLOBs 'Long binary data' in a database. Retrieval is much easier. I don't know if you have the option to do this, but if so, it would be a good idea.

I do not believe that your problem is with the database. "Parameter not valid" exceptions when dealing with imaging can be a total pain as I have dealt with them before. They're not very clear on what the problem is.
How exactly was the image placed into the database? There could be a problem with writing the image into the database before you've even attempted to pull it. Also, what file type is the image?
EDIT Here is some sample code that I've used before to get an image from a byte array.
//takes an array of bytes and converts them to an image.
private Image getImageFromBytes(byte[] myByteArray)
{
System.IO.MemoryStream newImageStream = new System.IO.MemoryStream(myByteArray, 0, myByteArray.Length);
return Image.FromStream(newImageStream, true);
}

How to read and modify the colorspace of an image in c#

I'm loading a Bitmap from a jpg file. If the image is not 24bit RGB, I'd like to convert it. The conversion should be fairly fast. The images I'm loading are up to huge (9000*9000 pixel with a compressed size of 40-50MB). How can this be done?
Btw: I don't want to use any external libraries if possible. But if you know of an open source utility class performing the most common imaging tasks, I'd be happy to hear about it. Thanks in advance.

The jpeg should start with 0xFF 0xD8. After that you will find various fields in the format:
Field identifier 2 bytes
Field length, excluding field identifier. 2 bytes.
Variable data.
Parse through the fields. The identifier you will be looking for is 0xFF 0xC0. This is called SOF0, and contains height, width, bit depth, etc. 0xFF 0xC0 will be followed by two bytes for the field length. Immediately following that will be a single byte showing the bit depth, which will usually be 8. Then there will be two bytes for height, two for width, and a single byte for the number of components; this will usually be 1 (for greyscale) or 3. (for color)

This isn't something I've tried myself, but I think you might need to acccess the picture's EXIF information as a start.
Check out Scott Hanselman's blog-entry on accessing EXIF information from pictures.

Standard .NET System.Drawing namespace should have all that you need,
but it probably won't be very efficient. It'll load the whole thing into RAM, uncompress it, convert it (probably by making a copy) and then re-compress and save it. If you aim for high performance, I'm afraid you might need to look into C/C++ libraries and make .NET wrappers for them.

As far as I know jpg is always 24 bpp. The only thing that could change would be that it's CMY(K?) rather then RGB. That information would be stored in the header. Unfortunately I don't have any means of creating a CMYK image to test whether loading into a Bitmap will convert it automatically.
The following line will read the file into memory:
Bitmap image = Image.FromFile(fileName);
image.PixelFormat will tell you the image format. However, I can't test what the file load does with files other than 24bpp RGB jpgs. I can only recommend that you try it out.

iphone: making a new UIImage from data from an XML document

I'm prototyping a video streaming client for the iphone that gets it's content from a webserver written in C#.
The server outputs an XML document where the jpg data for the image is stored in one of the tags (). It writes it out using WriteBase64.
On the iPhone, I'm using libxml to parse the xml and storing the bytes for the image in an NSString.
The next step is to create an NSData object using the data and then a new UIImage using it's +initWithData method.
However, each time I try to create a new image, the result is a nil object indicating failure. My best guess is that there is something I need to do to convert the NSString back somehow.
Please help!!

Are you properly base64 decoding your the string from your XML? There is not a native way to do this in Objective-C that I am aware of, but there is a good discussion on how to roll your own here: http://www.cocoadev.com/index.pl?BaseSixtyFour

An alternate approach would be to attach the image binary data directly after the XML stream and parse it out yourself - base64 encoding really expands the size of the image you are transferring.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.