Decoding base64-encoded data from xml document - c#

I receive some xml-files with embedded base64-encoded images, that I need to decode and save as files.
An unmodified (other than zipped) example of such a file can be downloaded below:
20091123-125320.zip (60KB)
However, I get errors like "Invalid length for a Base-64 char array" and "Invalid character in a Base-64 string". I marked the line in the code where I get the error in the code.
A file could look like this:
<?xml version="1.0" encoding="windows-1252"?>
<mediafiles>
<media media-type="image">
<media-reference mime-type="image/jpeg"/>
<media-object encoding="base64"><![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]></media-object>
<media.caption>What up</media.caption>
</media>
</mediafiles>
And the code to process like this:
var xd = new XmlDocument();
xd.Load(filename);
var nodes = xd.GetElementsByTagName("media");
foreach (XmlNode node in nodes)
{
var mediaObjectNode = node.SelectSingleNode("media-object");
//The line below is where the errors occur
byte[] imageBytes = Convert.FromBase64String(mediaObjectNode.InnerText);
//Do stuff with the bytearray to save the image
}
The xml-data is from an enterprise newspaper system, so I am pretty sure the files are ok - and there must be something in the way I process them, that is just wrong. Maybe a problem with the encoding?
I have tried writing out the contents of mediaObjectNode.InnerText, and it is the base64 encoded data - so the navigating the xml-doc is not the issue.
I have been googling, binging, stackoverflowing and crying - and found no solution... Help!
Edit:
Added an actual example file (and a bounty). PLease note the downloadable file is in a bit different schema, since I simplified it in the above example, removing irrelevant stuff...

For a first shot i didn't use any programming language, just Notepad++
I opened the xml file within and copy and pasted the raw base64 content into a new file (without square brackets).
Afterwards I selected everything (Strg-A) and used the option Extensions - Mime Tools - Base64 decode. This threw an error about the wrong text length (must be mod 4). So i just added two equal signs ('=') as placeholder at the end to get the correct length.
Another retry and it decoded successfully into 'something'. Just save the file as .jpg and it opens like a charm in any picture viewer.
So i would say, there IS something wrong with the data you'll get. They just don't have the right numbers of equal signs at the end to fill up to a number of signs which can be break into packets of 4.
The 'easy' way would be to add the equal sign till the decoding doesn't throw an error. The better way would be to count the number of characters (minus CR/LFs!) and add the needed ones in one step.
Further investigations
After some coding and reading of the convert function, the problem is a wrong attaching of a equal sign from the producer. Notepad++ has no problem with tons of equal signs, but the Convert function from MS only works with zero, one or two signs. So if you fill up the already existing one with additional equal signs you get an error too! To get this damn thing to work, you have to cut off all existing signs, calculate how much are needed and add them again.
Just for the bounty, here is my code (not absolute perfect, but enough for a good starting point): ;-)
static void Main(string[] args)
{
var elements = XElement
.Load("test.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (XElement element in elements)
{
var image = AnotherDecode64(element.Value);
}
}
static byte[] AnotherDecode64(string base64Decoded)
{
string temp = base64Decoded.TrimEnd('=');
int asciiChars = temp.Length - temp.Count(c => Char.IsWhiteSpace(c));
switch (asciiChars % 4)
{
case 1:
//This would always produce an exception!!
//Regardless what (or what not) you attach to your string!
//Better would be some kind of throw new Exception()
return new byte[0];
case 0:
asciiChars = 0;
break;
case 2:
asciiChars = 2;
break;
case 3:
asciiChars = 1;
break;
}
temp += new String('=', asciiChars);
return Convert.FromBase64String(temp);
}

The base64 string is not valid as Oliver has already said, the string length must be multiples of 4 after removing white space characters. If you look at then end of the base64 string (see below) you will see the line is shorter than the rest.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=
If you remove this line, your program will work, but the resulting image will have a missing section in the bottom right hand corner. You need to pad this line so the overall string length is corect. From my calculations if you had 3 characters it should work.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=

remove last 2 characters while image not get proper
public Image Base64ToImage(string base64String)
{
// Convert Base64 String to byte[]
byte[] imageBytes=null;
bool iscatch=true;
while(iscatch)
{
try
{
imageBytes = Convert.FromBase64String(base64String);
iscatch = false;
}
catch
{
int length=base64String.Length;
base64String=base64String.Substring(0,length-2);
}
}
MemoryStream ms = new MemoryStream(imageBytes, 0,
imageBytes.Length);
// Convert byte[] to Image
ms.Write(imageBytes, 0, imageBytes.Length);
Image image = Image.FromStream(ms, true);
pictureBox1.Image = image;
return image;
}

Try using Linq to XML:
using System.Xml.XPath;
class Program
{
static void Main(string[] args)
{
var elements = XElement
.Load("test.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (var element in elements)
{
byte[] image = Convert.FromBase64String(element.Value);
}
}
}
UPDATE:
After downloading the XML file and analyzing the value of the media-object node it is clear that it is not a valid base64 string:
string value = "PUT HERE THE BASE64 STRING FROM THE XML WITHOUT THE NEW LINES";
byte[] image = Convert.FromBase64String(value);
throws a System.FormatException saying that the length is not a valid base 64 string. Event when I remove the \n from the string it doesn't work:
var elements = XElement
.Load("20091123-125320.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (var element in elements)
{
string value = element.Value.Replace("\n", "");
byte[] image = Convert.FromBase64String(value);
}
also throws System.FormatException.

I've also had a problem with decoding Base64 encoded string from XML document (specifically Office OpenXML package document).
It turned out that string had additional encoding applied: HTML encoding, so doing first HTML decoding and then Base64 decoding did the trick:
private static byte[] DecodeHtmlBase64String(string value)
{
return System.Convert.FromBase64String(System.Net.WebUtility.HtmlDecode(value));
}
Just in case someone else stumbles on the same issue.

Well, it's all very simple. CDATA is a node itself, so mediaObjectNode.InnerText actually produces <![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]>, which is obviously not valid Base64-encoded data.
To make things work, use mediaObjectNode.ChildNodes[0].Value and pass that value to Convert.FromBase64String'.

Is the character encoding correct? The error sounds like there's a problem that causes invalid characters to appear in the array. Try copying out the text and decoding manually to see if the data is indeed valid.
(For the record, windows-1252 is not exactly the same as iso-8859-1, so that may be the cause of a problem, barring other sources of corruption.)

Related

All Characters in my Bitmap Textfile are in Chinese

So I am currently working on a program that will extract materials from .fbm files. In the ASCII fbm files, the data to extracted looks as follows:
/9j/4Sb7RXhpZgAATU0AKgAAAAgADAEAAAMAAAABEAAAAAEBAAMAAAABEAAAAAECAAMAAAADAAAAngEGAAMAAAABAAIAAAESAAMAAAABAAEAAAEVAAMAAAABAAMAAAEaAAUAAAABAAAApAEbAAUAAAABAAAArAEoAAMAAAABAAIAAAExAAIAAAAiAAAAtAEyAAIAAAAUAAAA1odpAAQAAAABAAAA7AAAASQACAAIAAgACvyAAAAnEAAK/IAAACcQQWRvYmUgUGhvdG9zaG9wIENDIDIwMTcgKFdpbmRvd3MpADIwMTk6MDc6MD
...
And there are several sets of these in the fbm file, each in quotations and comma-separated. Now, when I convert the first one of these strings in the file to a jpg, using the following:
//Convert ASCII data to binary
string asciiFileData = "";
StreamReader binaryDataReader = File.OpenText(Path.Combine(inputDirectory, convertedFBX + iterator.ToString() + ".txt"));
while (!binaryDataReader.EndOfStream)
{
var lineData = binaryDataReader.ReadLine();
if (String.IsNullOrEmpty(lineData)) continue;
asciiFileData += lineData;
}
string[] imageStrings = asciiFileData.Split(',');
List<byte[]> imageList = new List<byte[]>();
foreach (string imageString in imageStrings)
{
if(imageString.Length > 10)//A way of checking if there's actual data for the file to save
imageList.Add(Convert.FromBase64String(imageString.Trim().Replace(",", "").Replace("\"", "")));
}
//Save images
int iterator2 = 1;
foreach(byte[] image in imageList)
{
File.WriteAllBytes(Path.Combine(inputDirectory, convertedFBX + iterator.ToString() + iterator2++ + ".jpg"), image);
}
The first string creates a proper jpg of the materials. When I open it up in a text file, it's the usual strange alien characters (not quite positive what those are called). However, the jpgs after the first one cannot open. I open up the text files for them, and all the characters are in Chinese! Why on earth is that happening? What does it mean, and is it supposed to be that way? Thanks in advance for your answers!
This resource was mentioned in the comments; however, my answer was found at https://devblogs.microsoft.com/oldnewthing/20140930-00/?p=43953. Essentially, this error is what occurs when you try to force binary content to be some sort of specific encoding that it's not meant to be.

Unable to convert string to byte if high-order bit is set

Apologies for the abortive first try, particularly to Olivier. Trying again.
Situation is we have a string coming in from a mainframe to a C# app. We understand it needs to be converted to a byte array. However, this data is a mixture of ASCII characters and true binary UINT16 and UINT32 fields, which are not always in the same spot in the data. Later on we will deserialize the data and will know the structure's data alignments, but not at this juncture.
Logic flow briefly is to send a structure with binary embedded, receive a reply with binary embedded, convert string reply to bytes (this is where we have issues), deserialize the bytes based on an embedded structure name, then process the structure. Until we reach deserialize, we don't know where the UINTs are. Bits are bits at this point.
When we have a reply byte which is ultimately part of a UINT16, and that byte has the high-order bit set (making it "extended ascii" or "negative", however you want to say it), that byte is converted to nulls. So any value >= 128 in that byte is lost.
Our code to convert looks like this:
public async Task<byte[]> SendMessage(byte[] sendBytes)
{
byte[] recvbytes = null;
var url = new Uri("http://<snipped>");
WebRequest webRequest = WebRequest.Create(url);
webRequest.Method = "POST";
webRequest.ContentType = "application/octet-stream";
webRequest.Timeout = 10000;
using (Stream postStream = await webRequest.GetRequestStreamAsync().ConfigureAwait(false))
{
await postStream.WriteAsync(sendBytes, 0, sendBytes.Length);
await postStream.FlushAsync();
}
try
{
string Response;
int Res_lenght;
using (var response = (HttpWebResponse)await webRequest.GetResponseAsync())
using (Stream streamResponse = response.GetResponseStream())
using (StreamReader streamReader = new StreamReader(streamResponse))
{
Response = await streamReader.ReadToEndAsync();
Res_lenght = Response.Length;
}
if (string.IsNullOrEmpty(Response))
{
recvbytes = null;
}
else
{
recvbytes = ConvertToBytes(Response);
var table = (Encoding.Default.GetString(
recvbytes,
0,
recvbytes.Length - 1)).Split(new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None);
}
}
catch (WebException e)
{
//error
}
return recvbytes;
}
static byte[] ConvertToBytes(string inputString)
{
byte[] outputBytes = new byte[inputString.Length * sizeof(byte)];
String strLocalDate = DateTime.Now.ToString("hh.mm.ss.ffffff");
String fileName = "c:\\deleteMe\\Test" + strLocalDate;
fileName = fileName + ".txt";
StreamWriter writer = new StreamWriter(fileName, true);
for (int i=0;i<inputString.Length;i++) {
try
{
outputBytes[i] = Convert.ToByte(inputString[i]);
writer.Write("String in: {0} \t Byte out: {1} \t Index: {2} \n", inputString.Substring(i, 2), outputBytes[i], i);
}
catch (Exception ex)
{
//error
}
}
writer.Flush();
return outputBytes;
}
ConvertToBytes has a line in the FOR loop to display the values in and out, plus the index value. Here is one of several spots where we see the conversion error - note indexes 698 and 699 represent a UINT16:
String in: sp Byte out: 32 Index: 696 << sp = space
String in: sp Byte out: 32 Index: 697
String in: \0 Byte out: 0 Index: 698
String in: 2 Byte out: 50 Index: 700 << where is 699?
String in: 0 Byte out: 48 Index: 701
String in: 1 Byte out: 49 Index: 702
String in: 6 Byte out: 54 Index: 703
The expected value for index 699 is decimal 156, which is binary 10011100. The high order bit is on. So the conversion for #698 is correct, and for #700, which is an ascii 2 is correct, but not for #699. Given the UINT16 (0/156) is a component of the key to subsequent records, seeing 0/0 for the values is a show-stopper. We don't have a displacement error for 699, we see nulls in the deserialize. No idea why the .Write didn't report it.
Another example, such as 2/210 (decimal 722 when seen as a full UINT16) come out as 2/0 (decimal 512).
Please understand this code as shown above works for everything except the 8-bit reply string fields which have the high-order bit set.
Any suggestions how to convert a string element to byte regardless of the content of the string element would be appreciated. Thanks!
Without a good Minimal, Complete, and Verifiable example that reliably reproduces the problem, it's impossible to state specifically what is wrong. But given what you've posted, some useful observations can be made:
First and foremost, as far as "where is 699?" goes, it's obvious that an exception is being thrown. That's how the WriteLine() call would be skipped and result in no output for that index. You have a couple of opportunities in the code you posted for that to happen: the call to Convert.ToByte(), or the following statement (particularly the call to inputString.Substring()).
Unfortunately, without a good MCVE it's hard to understand why you are printing a two-character substring from the input string, or why you say the characters "sp" become the character value 0x20 (i.e. a space character). The output you describe in the question doesn't appear to be self-consistent. But, let's move on…
Assuming for the moment that at least in the specific case you're looking at, there are enough characters in inputString at that point for the call to Substring() to succeed, we're left with the conclusion that the call to Convert.ToByte() is failing.
Given what you wrote, it seems that the main issue here is a misunderstanding on your part about how text is encoded and manipulated in a C# program. In particular, a C# character is in some sense an abstraction and doesn't have an encoding at all. To the extent that you force the encoding to be revealed, i.e. by casting or otherwise converting the raw character value directly, that value is always encoded as UTF16.
Put another way: you are dealing with a C# string object, made of C# char values. I.e. by the time you get this text into your program and call the ConvertToBytes() method, it's already been converted to UFT16, regardless of the encoding used by the sender.
In UTF16, character values that would be greater than 127 (0x7f) in an "extended ASCII" encoding (e.g. any of the various ANSI/OEM/ISO single-byte encodings) are not encoded as their original value. Instead, they will have a 16-bit value greater than 255.
When you ask Convert.ToByte() to convert such a value to a byte, it will throw an exception, because the value is larger than the largest value that can fit in a byte.
It is fairly clear why the code you posted is producing the results you describe (at least, to some extent). But it is not clear at all what you are actually hoping to accomplish here. I can say that attempting to convert char values to/from byte values by straight casting is simply not going to work. The char type isn't a byte, it's two bytes and any non-ASCII characters will use larger values than can fit in a byte. You should be using one of the several .NET classes that actually will do text encoding, such as the Encoding.GetBytes() method.
Of course, to do that you'll have to make sure you first understand precisely why you are trying to convert to bytes and what encoding you want to use. The code you posted seems to be trying to interpret your encoded bytes as the current Encoding.Default encoding, so you should use that encoding to encode the text. But there's not really any value in encoding to that encoding only to decode back to a C# string value. Assuming you've done it correctly, all that will happen is you'll get exactly the same string you started with.
In other words, while I can explain the behavior you're seeing to the extent that you've described it here, that's unlikely to address whatever broader problem you are actually trying to solve. If the above does not get you back on track, please post a new question in which you've included a good MCVE and a clear explanation of what that broader problem you're trying to solve actually us.

Understand how to decode this base64 encoded string

I've handled base64 encoded images and strings and have been able to decode them using C# in the past.
I'm now trying on what looks to me like a base64 string, but the value I'm getting is about 98% accurate and I just don't understand what is affecting the output.
Here is the string:
http://pastebin.com/ntcth6uN
And this is the decoded value:
http://pastebin.com/Buh4xXDA
That IS what it should be, but you can clearly see where there are artifacts and the decoded value isn't quite right.
Any idea why it's failing?
var data = Convert.FromBase64String(Faces[i].InfoData);
Faces[i].InfoData = Encoding.UTF8.GetString(data);
Thanks for your help.
The string is not encoded as UTF8, but instead as another encoding. Thus the same encoding must be used to decode it.
Use the following to decode it:
Encoding.ASCII.GetString(data);
If ASCII isn't the correct encoding, here's some code that will iterate through available encodings and list the first 200 characters in order to manually select original encoding:
String encStr = "PFdhdGNoIG5hbWU9IlBpa2FjaHUDV2F0Y2DDQ2hyb25vIiBkZXNjcmlwdGlvbj0iIiBhdXRob3I9IkFsY2hlbWlzdFByaW1lIiB3ZWJfbGluaz0iaHR0cgov42ZhY2VyZXBv4mNvbS9hcHAvZmFjZXMvdXNlci9BbGNoZW1pc3RQcmltZS8xIiBiZ19jb2xvcj0iYWJhYmFiIiBpbmRfbG9jPSJ0YyIDaW5kX2JnPSJZIiBob3R3b3JkX2xvYz0idGMiIGhvdHdvcmRfYmc9IlkiPDoDICADPExheWVyIHR5cGU9ImltYWdlIiBLPSIwIiB5PSIwIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDcGF0ag0i4mltZzQ5OTIucHBuZyIDd2lkdGD9IjU1MCIDaGVpZ2h0PSI1NTAiIGNvbG9yPSJmZmZmZmYiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMTciIHk9IjQ5IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie2RofTp7ZG16fSIDdGVLdF9zaXplPSIzMiIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iOTAiIHk9IjU0IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie2Rzen0iIHRleHRfc2l6ZT0iMTDiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9IjEyNCIDeT0iNTQiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMTAwIG9yIgAiIGFsaWdubWVudg0iY2MiIHRleHQ9IntkYX0iIHRleHRfc2l6ZT0iMjAiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9Ii0LOSIDeT0iNTIiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJ7Ymx9IiB0ZXh0X3NpemU9IjE5IiBmb250PSJMQ0RQSE9ORSIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09IjFkMWQxZCIDY29sb3I9IjFkMWQxZCIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJzaGFwZSIDeg0iMSIDeT0iNgDiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMCBvciAxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJTcXVhcmUiIHdpZHRoPSIyNzDiIGhlaWdodg0iNgUiIGNvbG9yPSJhYmFiYWIiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMCIDeT0iNgDiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMCBvciAxMgAiIGFsaWdubWVudg0iY2MiIHRleHQ9Intzd219Ontzd3N9Ontzd3Nzc30iIHRleHRfc2l6ZT0iMzYiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9Ii0xMTDiIHk9Ijk2IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie3N3cn0DYW5kICZhcG9zO1NUT1AmYXBvczsDb3IDJmFwb3M7U1RBUlQmYXBvczsiIHRleHRfc2l6ZT0iMTUiIGZvbnQ9IlJvYm90by1SZWd1bGFyIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIDdGFwX2FjdGlvbj0ic3dfc3RhcnRfc3RvcCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9IjExOCIDeT0iOTYiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJSRVNFVCIDdGVLdF9zaXplPSIxNSIDZm9udg0iUm9ib3Rv4VJlZ3VsYXIiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIiB0YXBfYWN0aW9uPSJzd19yZXNldCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSItMTI0IiB5PSIxNgEiIGd5cm89IjAiIHJvdGF0aW9uPSItOTAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJUcmlhbmdsZSIDd2lkdGD9IjM1IiBoZWlnaHQ9IjM1IiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSItMTE3IiB5PSIxNgQiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMCIDYWxpZ25tZW50PSJjYyIDc2hhcGU9IlNxdWFyZSIDd2lkdGD9IjEyMCIDaGVpZ2h0PSIxMjAiIGNvbG9yPSIyOWI5ZgMiIGRpc3BsYXk9ImJkIiB0YXBfYWN0aW9uPSJzd19zdGFydF9zdG9wIi8+CiADICA8TGF5ZXIDdHlwZT0ic2hhcGUiIHD9IjEyNCIDeT0iMTQxIiBneXJvPSIwIiByb3RhdGlvbj0iOTAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJUcmlhbmdsZSIDd2lkdGD9IjM1IiBoZWlnaHQ9IjM1IiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSIxMTciIHk9IjE0NCIDZ3lybz0iMCIDcm90YXRpb2L9IjAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIwIiBhbGlnbm1lbnQ9ImNjIiBzaGFwZT0iU3F1YXJlIiB3aWR0ag0iMTIwIiBoZWlnaHQ9IjEyMCIDY29sb3I9IjI5YjlkMyIDZGlzcGxheT0iYmQiIHRhcF9hY3Rpb2L9InN3X3Jlc2V0Ii8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0i4TDyIiB5PSItNgMiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJ7d3RkfSIDdGVLdF9zaXplPSIyNSIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0iaW1hZ2VfY29uZCIDeg0i4TD5IiB5PSItOTIiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiBwYXRoPSJ3ZWF0aGVyX3NldF8z4nBwbmciIHdpZHRoPSI3MCIDaGVpZ2h0PSI3MCIDY29sb3I9IjQ1NgU0NSIDY29uZF92YWx1ZT0iJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MgFkJmFwb3M7IGFuZCAxIG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzAyZCZhcG9zOyBhbmQDMiBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczswM2QmYXBvczsDYW5kIgMDb3IDJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MgRkJmFwb3M7IGFuZCA0IG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzA5ZCZhcG9zOyBhbmQDNSBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczsxMGQmYXBvczsDYW5kIgYDb3IDJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MTFkJmFwb3M7IGFuZCA3IG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzEzZCZhcG9zOyBhbmQDOCBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczs1MGQmYXBvczsDYW5kIgkDb3IDMSIDY29uZF9ncmlkPSIzegMiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMTEzIiB5PSIzIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0iTW9vbiIDdGVLdF9zaXplPSIxOCIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0ic2hhcGUiIHD9IjExNCIDeT0i4TUyIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDc2hhcGU9IkNpcmNsZSIDd2lkdGD9IjYwIiBoZWlnaHQ9IjYwIiBjb2xvcj0iNgU0NTQ1IiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9ImltYWdlX2NvbmQiIHD9IjExNCIDeT0i4TUyIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDcGF0ag0ibW9vbl9zZXRfMy5wcG5nIiB3aWR0ag0iNjAiIGhlaWdodg0iNjAiIGNvbG9yPSJhYmFiYWIiIGNvbmRfdmFsdWU9Int3bXB9IiBjb25kX2dyaWQ9IjNLMyIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJ0ZXh0IiBLPSIwIiB5PSIwIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0iIiB0ZXh0X3NpemU9IjQwIiBmb250PSJSb2JvdG8tUmVndWxhciIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09ImZmZmZmZiIDY29sb3I9ImZmZmZmZiIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJ0ZXh0IiBLPSIxMCIDeT0i4TExNiIDZ3lybz0iMCIDcm90YXRpb2L9IjAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHRleHQ9IntkZHd9IHtkbm5ufSB7ZGR9IiB0ZXh0X3NpemU9IjE2IiBmb250PSJMQ0RQSE9ORSIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09IjFkMWQxZCIDY29sb3I9IjFkMWQxZCIDZGlzcGxheT0iYmQi4zLKPC9XYXRjagLK";
var data = Convert.FromBase64String(encStr);
// Iterate over all encodings, and decode
foreach (EncodingInfo ei in Encoding.GetEncodings())
{
Encoding e = ei.GetEncoding();
Console.Write("{0,-15} - {1}{2}", ei.CodePage, e.EncodingName, System.Environment.NewLine);
Console.WriteLine(e.GetString(data).Substring(0, 200));
Console.Write(System.Environment.NewLine + "---------------------" + System.Environment.NewLine);
}
Fiddle: https://dotnetfiddle.net/LcV5s8

MVC Convert Base64 String to Image, but ... System.FormatException

My controller is getting an uploaded image in the request object in this code:
[HttpPost]
public string Upload()
{
string fileName = Request.Form["FileName"];
string description = Request.Form["Description"];
string image = Request.Form["Image"];
return fileName;
}
The value of image (at least the beginning of it) looks a lot like this:
...
I tried to convert using the following:
byte[] bImage = Convert.FromBase64String(image);
However, that gives the System.FormatException: "The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters."
I get the feeling that the problem is that at least the start of the string isn't base64, but for all I know none of it is. Do I need to parse the string before decoding it? Am I missing something completely different?
It looks like you may just be able to strip out the "data:image/jpeg;base64," part from the start. For example:
const string ExpectedImagePrefix = "data:image/jpeg;base64,";
...
if (image.StartsWith(ExpectedImagePrefix))
{
string base64 = image.Substring(ExpectedImagePrefix.Length);
byte[] data = Convert.FromBase64String(base64);
// Use the data
}
else
{
// Not in the expected format
}
Of course you may want to make this somewhat less JPEG-specific, but I'd try that as a very first pass.
The reason really is "data:image/jpeg;base64,", I'll suggest using this method for removing starting string from base64
var base64Content = image.Split(',')[1];
byte[] bImage = Convert.FromBase64String(base64Content);
This is Shortest solution and you don't have to use magic strings, or write a regex.

Bit Array to String and back to Bit Array

Possible Duplicate Converting byte array to string and back again in C#
I am using Huffman Coding for compression and decompression of some text from here
The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.
For my situation, i need to get the compressed content, store it and decompress it when ever need.
The output from the encoder and the input to the decoder are BitArray.
When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.
Tree huffmanTree = new Tree();
huffmanTree.Build(input);
string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);
// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();
// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);
// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);
// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);
// Convert bytes back to bit array
BitArray todecode = new BitArray(e);
string decoded = huffmanTree.Decode(todecode);
Console.WriteLine("Decoded: " + decoded);
Console.ReadLine();
The Output of Original code from the tutorial is:
The Output of My Code is:
Where am I wrong friends? Help me, Thanks in advance.
You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.
string output = Encoding.UTF8.GetString(e);
e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.
Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.
I'm pretty sure Encoding doesn't roundtrip - that is you can't encode an arbitrary sequence of bytes to a string, and then use the same Encoding to get bytes back and always expect them to be the same.
If you want to be able to roundtrip from your raw bytes to string and back to the same raw bytes, you'd need to use base64 encoding e.g.
http://blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/22/how-to-encoding-and-decoding-base64-strings-in-c.aspx

Categories

Resources