Converting an SQL image to Base64 in C# - c#

I have a database storing Binary JPEG Images with two different file signatures (FFD8FFE0 and FFD8DDE1) and I would like to convert them into Base64 so I can use them in another application (Power BI). The data is stored as an IMAGE field type, however I only receive the data in a CSV file and import into my tool as a string and work with it from there.
For the file signature FFD8FFE0, I have no problem converting using the below code (from another Stack post - thank you):
public static string ToBase64(String sBinary)
{
int noChars = sBinary.Length;
byte[] bytes = new byte[noChars / 2];
for (int i = 0; i < noChars; i += 2)
{
bytes[i / 2] = Convert.ToByte(sBinary.Substring(i, 2), 16);
}
return Convert.ToBase64String(bytes);
}
However, the file signature FFD8FFE1 is not converting and displaying properly to an image. It gives me an output, but does not display properly.
Any advice? Is this because of the different file signature OR because of the size of the string (they are noticeably larger).
EDIT: Thank you everyone who assisted. As mentioned in the comments, the real issue was the data I was trying to convert - it was being truncated in the CSV. So for anything who ever comes across this post, pull directly from SQL and not a text file as there is a good chance the data will be truncated.

It sounds like you exported the binary data from an admin tool's query results. Such tools will almost always truncate the displayed binary results to conserve memory.
It's better and easier to read the data directly from the database using ADO.NET or a micro-ORM like Dapper to reduce the boilerplate code.
Using Dapper you could write something as simple as:
var sql="select image from MyTable where Category=#category";
using var connection=new SqlConnection(connectionString);
var images=connection.Query<byte[]>(sql,new {category="Cats"});
And convert it with :
var cats64=images.Select(bytes=>Convert.ToBase64String(bytes));
Dapper will handle opening and closing the connection, so we don't even have to do that.
If you want to retrieve more fields, you can define a class to accept the results. Dapper will map the result columns to properties by name. Once you have a class, you can easily add a method to return the Base64 string:
class MyData
{
public string Name{get;set;}
public byte[] Image {get;set;}
public string ToBase64()=>Convert.ToBase64String(Image);
}
....
var images=connection.Query<MyData>("select Name,Image From ....",...);
Using plain old ADO.NET needs a few more lines:
var sql="select image from MyTable where Category=#category";
using var connection=new SqlConnection(connectionString);
using var cmd=new SqlCommand(sql,connection);
cmd.Parameters.Add("#category",SqlDbType.NVarChar,20).Value="Cats";
using var reader=cmd.ExecuteReader();
while(reader.Read())
{
var image=(byte[])reader["image"];
...
}
With ADO.NET though, it's also possible to load the data as a stream instead of loading everything into memory. This is very helpful when the image is large because we avoid caching the entire blob in memory. We could write the stream to a file directly without first loading it in memory :
while(reader.Read())
{
using (var stream=reader.GetStream(0))
using (var file=File.Create(somePath))
{
stream.CopyTo(file);
}
}
There's no overload of Convert.ToBase64String that works with streams. It's possible to use the CryptoStream class with a Base64 transform to convert a stream to encode an input stream into Base64 as this SO answer shows. Adopting it to this case:
while(reader.Read())
{
using (var stream. = reader.GetStream(0))
using (var base64Stream = new CryptoStream( stream, new ToBase64Transform(), CryptoStreamMode.Read ) )
using (var outputFile = File.Create(somePath) )
{
await base64Stream.CopyToAsync( outputFile ).ConfigureAwait(false);
}

If you are looking to convert to a base64 jpg url:
public static string ToBase64PNGUrl (byte[] bytes) =>
$"data:image/jpg;base64,{Convert.ToBase64String(bytes)}";

You are mapping the results from the database incorrectly, this field should be byte[], not string.
I assume you are receiving a hexadecimal representation of the bytes, which you could convert to bytes then convert to base64 as you attempted (Convert.ToBase64String(bytes)).
Try using EF-code first to read from table and define the image property as byte[].

Related

Which FieldCodec to use when writing a RepeatedField byte per byte into a MemoryStream?

The protobuf SDK for C# contains an example project AddPerson.cs
It uses addressbook.proto, which defines a repeated field:
repeated PhoneNumber phones = 4;
I am trying to add a function to the AddPerson.cs, which would create an MD5 string out of the repeated field:
private string RepeatedFieldToMD5String<T>(RepeatedField<T> repeatedField)
{
MemoryStream memoryStream = new MemoryStream();
repeatedField.WriteTo(new CodedOutputStream(memoryStream), new FieldCodec<T>());
return string.Concat
(
MD5.Create().ComputeHash
(
memoryStream
).Select(x => x.ToString("x2"))
);
}
The problem is that there is no such constructor for the FieldCodec, it wants to have some arguments.
My guess is that I have to tell it "please take a FieldCodec, which would be suitable to writing an object byte for byte into an array"... but how to say that in C# :-) ?
In the FieldCodec.cs I have seen a FieldCodec.ForBytes() method, but it wants to have some tag. What would be a suitable tag here?
The reason I am asking this question is that I am trying to take a RepeatedField in my real project, generate a MD5 string over all of it - and use it in the If-None-Match/ETag HTTP header.

Convert json string response representing UCHAR array to Byte array

I'm calling a json-rpc api that returns a UCHAR array that represents a PDF file (so the result property on return contains a string representation of a UCHAR array). I need to convert this result string into a Byte array so I can handle it as a PDF file, i.e., save it and/or forward it as a file in a POST to another api.
I have tried the following (the result variable is the returned UCHAR string):
char[] pdfChar = result.ToCharArray();
byte[] pdfByte = new byte[pdfChar.Length];
for (int i = 0; i < pdfChar.Length; i++)
{
pdfByte[i] = Convert.ToByte(pdfChar[i]);
}
File.WriteAllBytes(basePath + "test.pdf", pdfByte);
I have also tried:
byte[] pdfByte = Encoding.ASCII.GetBytes(pdfObj.result);
File.WriteAllBytes(basePath + "test.pdf", pdfByte);
With both of these, when I try to open the resulting test.pdf file, it will not open, presumably because it was not converted properly.
Turns out that, although the output of the API function is UCHAR, when it comes in as part of the JSON string, it is a base64 string, so this works for me:
byte[] pdfBytes = Convert.FromBase64String(pdfObj.result);
I'm pretty sure the API is making that conversion "under the hood", i.e., while the function being called returns UCHAR, the api is using a framework to create the JSON-RPC responses, and it is likely performing the conversion before sending it out. If it is .NET that makes this conversion from UCHAR to base64, then please feel free to chime in and confirm this.
Do you know the file encoding format? Try to use this
return System.Text.Encoding.UTF8.GetString(pdfObj.result)
EDIT:
The solution you found is also reported here
var base64EncodedBytes = System.Convert.FromBase64String(pdfObj.result);
return System.Text.Encoding.UTF8.GetString(base64EncodedBytes)

Why is binary file vary large compared to text?

I have been keeping a large set of data as TEXT records in a TEXT file:
yyyyMMddTHHmmssfff doube1 double2
However when I read it I need to parse each DateTime. This is quite slow for millions of records.
So, now I am trying it as a binary file which I created by serlializing my class.
That way I do not need to parse the DateTime.
class MyRecord
{
DateTime DT;
double Price1;
double Price2;
}
public byte[] SerializeToByteArray()
{
var bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, this);
return ms.ToArray();
}
}
MyRecord mr = new MyRecord();
outBin = new BinaryWriter(File.Create(binFileName, 2048, FileOptions.None));
for (AllRecords) //Pseudo
{
mr = new MyRecord(); //Pseudo
outBin.Write(mr.SerializeToByteArray());
}
The resulting binary is on average 3 times the size of the TEXT file.
Is that to be expected?
EDIT 1
I am exploring using Protbuf to help me:
I want to do this with using USING to fit my existing structure.
private void DisplayBtn_Click(object sender, EventArgs e)
{
string fileName = dbDirectory + #"\nAD20120101.dat";
FileStream fs = File.OpenRead(fileName);
MyRecord tr;
while (fs.CanRead)
{
tr = Serializer.Deserialize<MyRecord>(fs);
Console.WriteLine("> "+ tr.ToString());
}
}
BUT after first record tr - full of zeroes.
Your archive likely has considerable overhead serializing type information with each record.
Instead, make the whole collection serializable (if it isn't already) and serialize that in one go.
You are not storing a simple binary version of your DateTime, but an object representing those. That is much larger then simply storing your Date as Text.
If you create a class
class MyRecords
{
DateTime[] DT;
double[] Price1;
double[] Price2;
}
And serialize that, it should be much smaller.
Also I guess DateTime still needs lots of space, so you can convert your DateTime to a Integer Unix Timestamp and store that.
As Requested by the OP.
the output is not a binary file it's binary serialization of instances plus an overhead of BinaryFormatter to allow deserialization later for this reason you get 3 times the file large than expected
if you need a smart serialization solution you can take a look at ProtoBuf-net https://code.google.com/p/protobuf-net/
here you can find a link explaining how you can achieve this
[ProtoContract]
Public class MyRecord
{ [ProtoMember(1)]
DateTime DT;
[ProtoMember(2)]
double Price1;
[ProtoMember(3)]
double Price2;
}

How to write numbers to a file and make them readable between Java and C#

I'm into a "compatibility" issue between two versions of the same program, the first one written in Java, the second it's a port in C#.
My goal is to write some data to a file (for example, in Java), like a sequence of numbers, then to have the ability to read it in C#. Obviously, the operation should work in the reversed order.
For example, I want to write 3 numbers in sequence, represented with the following schema:
first number as one 'byte' (4 bit)
second number as one 'integer' (32 bit)
third number as one 'integer' (32 bit)
So, I can put on a new file the following sequence: 2 (as byte), 120 (as int32), 180 (as int32)
In Java, the writing procedure is more or less this one:
FileOutputStream outputStream;
byte[] byteToWrite;
// ... initialization....
// first byte
outputStream.write(first_byte);
// integers
byteToWrite = ByteBuffer.allocate(4).putInt(first_integer).array();
outputStream.write(byteToWrite);
byteToWrite = ByteBuffer.allocate(4).putInt(second_integer).array();
outputStream.write(byteToWrite);
outputStream.close();
While the reading part it's the following:
FileInputStream inputStream;
ByteBuffer byteToRead;
// ... initialization....
// first byte
first_byte = inputStream.read();
// integers
byteToRead = ByteBuffer.allocate(4);
inputStream.read(byteToRead.array());
first_integer = byteToRead.getInt();
byteToRead = ByteBuffer.allocate(4);
inputStream.read(byteToRead.array());
second_integer = byteToRead.getInt();
inputStream.close();
C# code is the following. Writing:
FileStream fs;
byte[] byteToWrite;
// ... initialization....
// first byte
byteToWrite = new byte[1];
byteToWrite[0] = first_byte;
fs.Write(byteToWrite, 0, byteToWrite.Length);
// integers
byteToWrite = BitConverter.GetBytes(first_integer);
fs.Write(byteToWrite, 0, byteToWrite.Length);
byteToWrite = BitConverter.GetBytes(second_integer);
fs.Write(byteToWrite, 0, byteToWrite.Length);
Reading:
FileStream fs;
byte[] byteToWrite;
// ... initialization....
// first byte
byte[] firstByteBuff = new byte[1];
fs.Read(firstByteBuff, 0, firstByteBuff.Length);
first_byte = firstByteBuff[0];
// integers
byteToRead = new byte[4 * 2];
fs.Read(byteToRead, 0, byteToRead.Length);
first_integer = BitConverter.ToInt32(byteToRead, 0);
second_integer = BitConverter.ToInt32(byteToRead, 4);
Please note that both the procedures works when the same Java/C# version of the program writes and reads the file. The problem is when I try to read a file written by the Java program from the C# version and viceversa. Readed integers are always "strange" numbers (like -1451020...).
There's surely a compatibility issue regarding the way Java stores and reads 32bit integer values (always signed, right?), in contrast to C#. How to handle this?
It's just an endian-ness issue. You can use my MiscUtil library to read big-endian data from .NET.
However, I would strongly advise a simpler approach to both your Java and your .NET:
In Java, use DataInputStream and DataOutputStream. There's no need to get complicated with ByteBuffer etc.
In .NET, use EndianBinaryReader from MiscUtil, which extends BinaryReader (and likewise EndianBinaryWriter for BinaryWriter)
Alternatively, consider just using text instead.
I'd consider using a standard format like XML or JSON to store your data. Then you can use standard serializers in both Java and C# to read/write the file. This sort of approach lets you easily name the data fields, read it from many languages, be easily understandable if someone opens the file in a text editor, and more easily add data to be serialized.
E.g. you can read/write JSON with Gson in Java and Json.NET in C#. The class might look like this in C#:
public class MyData
{
public byte FirstValue { get; set; }
public int SecondValue { get; set; }
public int ThirdValue { get; set; }
}
// serialize to string example
var myData = new MyData { FirstValue = 2, SecondValue = 5, ThirdValue = -1 };
string serialized = JsonConvert.SerializeObject(myData);
It would serialize to
{"FirstValue":2,"SecondValue":5,"ThirdValue":-1}
The Java would, similarly, be quite simple. You can find examples of how to read/write files in each library.
Or if an array would be a better model for your data:
string serialized = JsonConvert.SerializeObject(new[] { 2, 5, -1 }); // [2,5,-1]

How can I convert a byte array to a string array?

Just to clarify something first. I am not trying to convert a byte array to a single string. I am trying to convert a byte-array to a string-array.
I am fetching some data from the clipboard using the GetClipboardData API, and then I'm copying the data from the memory as a byte array. When you're copying multiple files (hence a CF_HDROP clipboard format), I want to convert this byte array into a string array of the files copied.
Here's my code so far.
//Get pointer to clipboard data in the selected format
var clipboardDataPointer = GetClipboardData(format);
//Do a bunch of crap necessary to copy the data from the memory
//the above pointer points at to a place we can access it.
var length = GlobalSize(clipboardDataPointer);
var #lock = GlobalLock(clipboardDataPointer);
//Init a buffer which will contain the clipboard data
var buffer = new byte[(int)length];
//Copy clipboard data to buffer
Marshal.Copy(#lock, buffer, 0, (int)length);
GlobalUnlock(clipboardDataPointer);
snapshot.InsertData(format, buffer);
Now, here's my code for reading the buffer data afterwards.
var formatter = new BinaryFormatter();
using (var serializedData = new MemoryStream(buffer))
{
paths = (string[]) formatter.Deserialize(serializedData);
}
This won't work, and it'll crash with an exception saying that the stream doesn't contain a binary header. I suppose this is because it doesn't know which type to deserialize into.
I've tried looking the Marshal class through. Nothing seems of any relevance.
If the data came through the Win32 API then a string array will just be a sequence of null-terminated strings with a double-null-terminator at the end. (Note that the strings will be UTF-16, so two bytes per character). You'll basically need to pull the strings out one at a time into an array.
The method you're looking for here is Marshal.PtrToStringUni, which you should use instead of Marshal.Copy since it works on an IntPtr. It will extract a string, up to the first null character, from your IntPtr and copy it to a string.
The idea would be to continually extract a single string, then advance the IntPtr past the null byte to the start of the next string, until you run out of buffer. I have not tested this, and it could probably be improved (in particular I think there's a smarter way to detect the end of the buffer) but the basic idea would be:
var myptr = GetClipboardData(format);
var length = GlobalSize(myptr);
var result = new List<string>();
var pos = 0;
while ( pos < length )
{
var str = Marshal.PtrToStringUni(myptr);
var count = Encoding.Unicode.GetByteCount(str);
myptr = IntPtr.Add(myptr, count + 1);
pos += count + 1;
result.Add(str);
}
return result.ToArray();
(By the way: the reason your deserialization doesn't work is because serializing a string[] doesn't just write out the characters as bytes; it writes out the structure of a string array, including additional internal bits that .NET uses like the lengths, and a binary header with type information. What you're getting back from the clipboard has none of that present, so it cannot be deserialized.)
How about this:
var strings = Encoding.Unicode
.GetString(buffer)
.Split(new[] { '\0' }, StringSplitOptions.RemoveEmptyEntries);

Categories

Resources