In PL/SQL how can I convert a string (long HTML string with new line and tags, etc) to Base64 that is easy to decrypt in C#?
In C# there are:
Convert.ToBase64String()
Convert.ToBase64CharArray()
BitConverter.ToString()
which one is compatible with PL/SQL
utl_encode.base64_encode();
?
I welcome any other suggestions :-)
You'll probably want to use this method:
Convert.ToBase64String()
It returns a Base64 encoded String based off an array of unsigned 8-bit integers (bytes).
As an alternate, you can use Convert.ToBase64CharArray(), but the output is a character array, which is a bit odd but may be useful in certain circumstances.
The method BitConverter.ToString() returns a String, but the bytes are represented in Hexadecimal, not Base64 encoded.
I done it :-)
PL/SQL
s1 varchar2(32767);
s2 varchar2(32767);
s2:= utl_raw.cast_to_varchar2(utl_encode.base64_encode(utl_raw.cast_to_raw(s1)));
s2:= utl_raw.cast_to_varchar2(utl_encode.base64_decode(utl_raw.cast_to_raw(s1)));
are compatible with C#
public static string ToBase64(string str)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(str));
}
//++++++++++++++++++++++++++++++++++++++++++++++
public static string FromBase64(string str)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(str));
}
hope you find it useful :-)
Related
Sometimes the byte array b64 is UTF-8, and other times is UTF-16. I keep reading online that C# strings are always UTF-16, but that is not the case for me here. Why is this happening, and how do I fix it? I have a simple method for converting a base64 string to a normal string:
public static string FromBase64(this string input)
{
String corrected = new string(input.ToCharArray());
byte[] b64 = Convert.FromBase64String(corrected);
if (b64[1] == 0)
{
return System.Text.Encoding.Unicode.GetString(b64);
}
else
{
return System.Text.Encoding.UTF8.GetString(b64);
}
}
The same thing is happening to my base 64 encoder:
public static string ToBase64(this string input)
{
String b64 = Convert.ToBase64String(input.GetBytes());
return b64;
}
public static byte[] GetBytes(this string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
Example:
On my computer, "cABhAHMAcwB3AG8AcgBkADEA" decodes to:
'p','\0','a','\0','s','\0','s','\0','w','\0','o','\0','r','\0','d','\0','1','\0'
But on my coworkers computer it is:
'p','a','s','s','w','o','r','d','1'
Edit:
I know that the string I create comes from a textbox, and that the file where I am saving it to is always going to be UTF-8, so everything is pointing to the Convert method causing my encoding switch.
Update:
After digging in further, it appears that my coworker had a very important line commented in his version of the code, the one that saves the value read from file to the hashtable. The default value I was using is a UTF-8 base64 value, so I am going to correct the default, to a utf-16 value, then I can clean up the code removing any UTF8 references.
Also, I had been naively using the UTF-8 base64 encoding I had retrieved from a website, not realizing what I was getting myself into. The funny part is I would never have found that fact if my coworker hadn't commented the line that saves the values from the file.
Final version of the code:
public static string FromBase64(this string input)
{
byte[] b64 = Convert.FromBase64String(input);
return System.Text.Encoding.Unicode.GetString(b64);
}
public static string ToBase64(this string input)
{
String b64 = Convert.ToBase64String(input.GetBytes());
return b64;
}
public static byte[] GetBytes(this string str)
{
return System.Text.Encoding.Unicode.GetBytes(str);
}
First of all, I want to debunk the title of the question:
Convert.FromBase64String() returns Unicode sometimes, or UTF-8
That is not the case. Give then same input, valid base64 encoded text, Convert.FromBase64String() always returns the same output.
Moving on, you cannot determine definitively, just by examining the payload, the encoding used for a string. You attempt to do this with
if (b64[1] == 0)
// encoding must be UTF-16
This is not the case. The overwhelming majority of UTF-16 character elements fail that test. It does not matter how you try to write this test it is doomed to fail. And that is because there exist byte arrays that are well-defined strings when interpreted as different encodings. In other words it is possible, for instance, to construct byte arrays that are valid when considered as either UTF-8 or UTF-16.
So, you have to know a priori whether the payload is encoded as UTF-16, UTF-8 or indeed some other encoding.
The solution will be to keep track of the original encoding, before the base64 encoding. Pass that information along with the base64 encoded payload. Then when you decode, you can determine which Encoding to use to decode back to a string.
It looks to me very much that your strings are all coming from UTF-16 .net strings. In which case you won't have UTF-8 strings ever, and should always decode with UTF-16. That is you use Encoding.Unicode.GetString().
Also, the GetBytes method in your code is poor. It should be:
public static byte[] GetBytes(this string str)
{
return Encoding.Unicode.GetBytes(str);
}
Another oddity:
String corrected = new string(input.ToCharArray());
This is a no-op.
Finally, it is quite likely that your text will be more compact when encoded as UTF-8. So perhaps you should consider doing that before applying the base64 encoding.
Regarding your update, what you state is incorrect. This code:
string str = Encoding.Unicode.GetString(
Convert.FromBase64String("cABhAHMAcwB3AG8AcgBkADEA"));
assigns password1 to str wherever it is run.
Try revising the code to make it a little more readable/accurate. As mentioned in my comment and David Hefferman's answer you're trying to do things that either:
A) do nothing
or
B) demonstrate flawed logic
The following code based upon yours works fine:
class Program
{
static void Main(string[] args)
{
string original = "password1";
string encoded = original.ToBase64();
string decoded = encoded.FromBase64();
Console.WriteLine("Original: {0}", original);
Console.WriteLine("Encoded: {0}", encoded);
Console.WriteLine("Decoded: {0}", decoded);
}
}
public static class Extensions
{
public static string FromBase64(this string input)
{
return System.Text.Encoding.Unicode.GetString(Convert.FromBase64String(input));
}
public static string ToBase64(this string input)
{
return Convert.ToBase64String(input.GetBytes());
}
public static byte[] GetBytes(this string str)
{
return System.Text.Encoding.Unicode.GetBytes(str);
}
}
What you are doing is no different than encoding data in either EBCDIC or ASCII, then trying to figure out which was used during the decode. As you have already discovered, this is not going to work very well.
The only way to get this to work correctly is to have a single encoding format used by all participants. This is a fundamental concept of communications.
Pick an encoding - let's say UTF-8 - and use it for all transformations between String and byte[]. This will ensure that you have accurate knowledge of the format of the payload and how to deal with it, as David Tanner has been telling you.
Here's the basic form:
public static string ToBase64(this string self)
{
byte[] bytes = Encoding.UTF8.GetBytes(self);
return Convert.ToBase64String(bytes);
}
public static string FromBase64(this string self)
{
byte[] bytes = Convert.FromBase64String(self);
return Encoding.UTF8.GetString(bytes);
}
Regardless of whatever weirdness might be happening between your computers, this code will produce the same encoded strings.
I need to convert a string into it's binary equivilent and keep it in a string. Then return it back into it's ASCII equivalent.
You can encode a string into a byte-wise representation by using an Encoding, e.g. UTF-8:
var str = "Out of cheese error";
var bytes = Encoding.UTF8.GetBytes(str);
To get back a .NET string object:
var strAgain = Encoding.UTF8.GetString(bytes);
// str == strAgain
You seem to want the representation as a series of '1' and '0' characters; I'm not sure why you do, but that's possible too:
var binStr = string.Join("", bytes.Select(b => Convert.ToString(b, 2)));
Encodings take an abstract string (in the sense that they're an opaque representation of a series of Unicode code points), and map them into a concrete series of bytes. The bytes are meaningless (again, because they're opaque) without the encoding. But, with the encoding, they can be turned back into a string.
You seem to be mixing up "ASCII" with strings; ASCII is simply an encoding that deals only with code-points up to 128. If you have a string containing an 'é', for example, it has no ASCII representation, and so most definitely cannot be represented using a series of ASCII bytes, even though it can exist peacefully in a .NET string object.
See this article by Joel Spolsky for further reading.
You can use these functions for converting to binary and restore it back :
public static string BinaryToString(string data)
{
List<Byte> byteList = new List<Byte>();
for (int i = 0; i < data.Length; i += 8)
{
byteList.Add(Convert.ToByte(data.Substring(i, 8), 2));
}
return Encoding.ASCII.GetString(byteList.ToArray());
}
and for converting string to binary :
public static string StringToBinary(string data)
{
StringBuilder sb = new StringBuilder();
foreach (char c in data.ToCharArray())
{
sb.Append(Convert.ToString(c, 2).PadLeft(8, '0'));
}
return sb.ToString();
}
Hope Helps You.
First convert the string into bytes, as described in my comment and in Cameron's answer; then iterate, convert each byte into an 8-digit binary number (possibly with Convert.ToString, padding appropriately), then concatenate. For the reverse direction, split by 8 characters, run through Convert.ToInt16, build up a byte array, then convert back to a string with GetString.
We created a unit test that uses the following methods to generate random UTF8 text:
private static Random _rand = new Random(Environment.TickCount);
public static byte CreateByte()
{
return (byte)_rand.Next(byte.MinValue, byte.MaxValue + 1);
}
public static byte[] CreateByteArray(int length)
{
return Repeat(CreateByte, length).ToArray();
}
public static string CreateUtf8String(int length)
{
return Encoding.UTF8.GetString(CreateByteArray(length));
}
private static IEnumerable<T> Repeat<T>(Func<T> func, int count)
{
for (int i = 0; i < count; i++)
{
yield return func();
}
}
In sending the random UTF8 strings to our business logic, XmlWriter writes the generated string and can fail with the error:
Test method UnitTest.Utf8 threw exception:
System.ArgumentException: ' ', hexadecimal value 0x0E, is an invalid character.
System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
System.Xml.XmlUtf8RawTextWriter.WriteAttributeTextBlock(Char* pSrc, Char* pSrcEnd)
System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
System.Xml.XmlWellFormedWriter.WriteString(String text)
System.Xml.XmlWriter.WriteAttributeString(String localName, String value)
We want to support any possible string to be passed in, and need these invalid characters escaped somehow.
XmlWriter already escapes things like &, <, >, etc., how can we deal with other invalid characters such as control characters, etc?
PS - let me know if our UTF8 generator is flawed (I'm already seeing where I shouldn't let it generate '\0')
The XmlConvert Class has a lot of useful methods (like EncodeName, IsXmlChar, ...) for making sure you're building valid Xml.
There are two problems:
Not all characters are valid for XML, even escaped. For XML 1.0, the only characters with a Unicode codepoint value of less than 0x0020 that are valid are TAB ( ), LF (
), and CR (
). See XML 1.0, Section 2.2, Characters .
For XML 1.1, which relatively few systems support, any character except NUL can be escaped in this manner.
Not all sequences of bytes are valid for UTF-8. For example, according to the specification, "The octet values C0, C1, F5 to FF never appear." Probably you would be better off just creating Strings of characters and ignoring UTF-8, or creating the String, converting it to UTF-8 and back if you're really into encoding.
Your UTF-8 generator appears to be flawed. There are many byte sequences which are invalid UTF-8 encodings.
A better way to generate valid random UTF-8 encodings is to generate random characters, put them into a string and then encode the string to UTF-8.
Mark points out that not every byte sequence is a valid UTF-8 sequence.
I'd like to add that not every character can exist in an XML document. Only some characters are valid, and this is true even if they are encoded as a numeric character reference.
Update: If you want to encode arbitrary binary data in XML, then use Base64 or some other encoding before writing them to XML.
How do I convert from unicode to single byte in C#?
This does not work:
int level =1;
string argument;
// and then argument is assigned
if (argument[2] == Convert.ToChar(level))
{
// does not work
}
And this:
char test1 = argument[2];
char test2 = Convert.ToChar(level);
produces funky results. test1 can be: 49 '1' while test2 will be 1 ''
How do I convert from unicode to single byte in C#?
This question makes no sense, and the sample code just makes things worse.
Unicode is a mapping from characters to code points. The code points are numbered from 0x0 to 0x10FFFF, which is far more values than can be stored in a single byte.
And the sample code has an int, a string, and a char. There are no bytes anywhere.
What are you really trying to do?
Use UnicodeEncoding.GetBytes().
UnicodeEncoding unicode = new UnicodeEncoding();
Byte[] encodedBytes = unicode.GetBytes(unicodeString);
char and string are always Unicode in .NET. You can't do it the way you're trying.
In fact, what are you trying to accomplish?
If you want to test whether the int level matches the char argument[2] then use
if (argument[2] == Convert.ToChar(level + (int)'0'))
I'm working with C# .Net
I would like to know how to convert a Unicode form string like "\u1D0EC"
(note that it's above "\uFFFF") to it's symbol... "𝃬"
Thanks For Advance!!!
That Unicode codepoint is encoded in UTF32. .NET and Windows encode Unicode in UTF16, you'll have to translate. UTF16 uses "surrogate pairs" to handle codepoints above 0xffff, a similar kind of approach as UTF8. The first code of the pair is 0xd800..dbff, the second code is 0xdc00..dfff. Try this sample code to see that at work:
using System;
using System.Text;
class Program {
static void Main(string[] args) {
uint utf32 = uint.Parse("1D0EC", System.Globalization.NumberStyles.HexNumber);
string s = Encoding.UTF32.GetString(BitConverter.GetBytes(utf32));
foreach (char c in s.ToCharArray()) {
Console.WriteLine("{0:X}", (uint)c);
}
Console.ReadLine();
}
}
Convert each sequence with int.Parse(String, NumberStyles) and char.ConvertFromUtf32:
string s = #"\U1D0EC";
string converted = char.ConvertFromUtf32(int.Parse(s.Substring(2), NumberStyles.HexNumber));
I have recently push my FOSS Uncode Converter at Codeplex (http://unicode.codeplex.com)
you can convert whatever you want to Hex code and from Hex code to get the right character, also there is a full information character database.
I use this code
public static char ConvertHexToUnicode(string hexCode)
{
if (hexCode != string.Empty)
return ((char)int.Parse(hexCode, NumberStyles.AllowHexSpecifier));
char empty = new char();
return empty;
}//end
you can see entire code on the http://unicode.codeplex.com/
It appears you just want this in your code... you can type it as a string literal using the escape code \Uxxxxxxxx (note that this is a capital U, and there must be 8 digits). For this example, it would be: "\U0001D0EC".