I am trying an experiment to convert a base64string to a string then back to a base64string, however, I am not getting my original base64string:
String profilepic = "/9j/4AAQ";
string Orig = System.Text.Encoding.Unicode.GetString(Convert.FromBase64String(profilepic));
string New = Convert.ToBase64String(System.Text.Encoding.Unicode.GetBytes(Orig));
The string New returns "/f//4AAQ".
Any thoughts of why this is happening?
You are doing it wrong. You should do it as below:
namespace ConsoleApplication1
{
using System;
using System.Text;
class Program
{
static void Main(string[] args)
{
string profilepic = "/9j/4AAQ";
string New = Convert.ToBase64String(Encoding.Unicode.GetBytes(profilepic));
byte[] raw = Convert.FromBase64String(New); // unpack the base-64 to a blob
string s = Encoding.Unicode.GetString(raw); // outputs /9j/4AAQ
Console.ReadKey();
}
}
}
You're assuming that the base64-encoded binary data in your example contains a UTF-16 encoded message. This may simply not be the case, and the System.Text.Encoding.Unicode class may alter the contents by discarding the bytes that it doesn't understand.
Therefore, the result of base64-encoding the UTF-16 encoded byte stream of the returned string may not yield the same output.
Your input string contains the binary sequence 0xff 0xd8 0xff 0xe0 0x00 0x10 (in hex). Interpreting this as UTF-16LE (which you're using with System.Text.Encoding.Unicode) the first character would be 0xffd8, but is placed in the string as 0xfffd, which explains the change.
I tried decoding it with Encoding.Unicode, Encoding.UTF8 and Encoding.Default, but none of them yielded anything intelligible.
Related
I'm trying to convert a converted UTF-8 string to UTF-16, because I'm going to read a file and it comes like the var strUTF8 below.
For example, the entry would be the string "Não é possÃvel equipar" and the return I needed is "Não é possível equipar".
static void Main(string[] args)
{
test3();
Console.ReadKey();
}
static void test3()
{
string str = "Não é possÃvel equipar";
string strUTF16 = Utf8ToUtf16(str);
Console.WriteLine(str);
Console.WriteLine(strUTF16);
}
static string Utf8ToUtf16(string utf8String)
{
byte[] utf8Bytes = Encoding.UTF8.GetBytes(utf8String);
byte[] unicodeBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, utf8Bytes);
return Encoding.Unicode.GetString(unicodeBytes);
}
I really don't know how to solve this. Any tips?
If you want to read a file then you should read a file. When you read the file, specify the encoding of that file. If I'm not mistaken UTF8 is the default, so reading files encoded with UTF8 doesn't require the encoding to be specified. If you want to save that text to a file with a specific encoding, specify that encoding when saving the file.
var text = File.ReadAllText(filePath, Encoding.UTF8);
File.WriteAllText(filePath, text, Encoding.Unicode);
That will effectively convert a file from UTF8 encoding to UTF16. A more verbose version would be:
var data = File.ReadAllBytes(filePath);
var text = Encoding.UTF8.GetString(data);
data = Encoding.Unicode.GetBytes(text);
File.WriteAllBytes(filePath, data);
Your Utf8ToUtf16() function is effectively a no-op. You are taking an arbitrary UTF-16 string as input, encoding it into UTF-8 bytes, then decoding those bytes as UTF-8 back into UTF-16. So, you effectively end up with the same string value you started with. You may as well have just written the following, the result would be the same:
static string Utf8ToUtf16(string utf8String)
{
return utf8String;
}
That being said, Não é possÃvel equipar is what you get when the UTF-8 encoded form of Não é possível equipar is mis-interpreted as Latin (probably ISO-8859-1) or Windows-125x etc, instead of being properly interpreted as UTF-8 to begin with.
If you have a C# string that contains such UTF-8 bytes which were up-scaled as-is to UTF-16 (why???), then you need to down-scale those characters as-is back into 8-bit bytes, and then you can decode those bytes as UTF-8, eg:
static void test3()
{
string str = "Não é possÃvel equipar";
string strUTF16 = Utf8ToUtf16(str);
Console.WriteLine(str);
Console.WriteLine(strUTF16);
}
static string Utf8ToUtf16(string utf8String)
{
byte[] utf8Bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(utf8String); // or: GetEncoding(28591)
return Encoding.UTF8.GetString(utf8Bytes);
}
Im having a problem generating a base64string with a specific encoder.
I have an application that generate this base64string
RQA6AFwAUAByAG8AagBlAGMAdABzAFwAWQBvAHUAdAB1AGIAZQAuAE0AYQBuAGEAZwBlAHIAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIAXABvAGIAagBcAFIAZQBsAGUAYQBzAGUAXABuAGUAdABzAHQAYQBuAGQAYQByAGQAMgAuADAAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIALgBkAGwAbAAAAA==
which is equal to
E:\Projects\Youtube.Manager\Youtube.Manager.Models.Container\obj\Release\netstandard2.0\Youtube.Manager.Models.Container.dll
Now im trying convert
E:\Projects\Youtube.Manager\Youtube.Manager.Models.Container\obj\Release\netstandard2.0\Youtube.Manager.Models.Container.dll
To base64string but im getting this instead
RTpcUHJvamVjdHNcWW91dHViZS5NYW5hZ2VyXFlvdXR1YmUuTWFuYWdlci5Nb2RlbHMuQ29udGFpbmVyXG9ialxSZWxlYXNlXG5ldHN0YW5kYXJkMi4wXFlvdXR1YmUuTWFuYWdlci5Nb2RlbHMuQ29udGFpbmVyLmRsbA==
I want to get the same result as the first base64string which is
RQA6AFwAUAByAG8AagBlAGMAdABzAFwAWQBvAHUAdAB1AGIAZQAuAE0AYQBuAGEAZwBlAHIAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIAXABvAGIAagBcAFIAZQBsAGUAYQBzAGUAXABuAGUAdABzAHQAYQBuAGQAYQByAGQAMgAuADAAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIALgBkAGwAbAAAAA==
How can i do it?
This is my code which is giving me a wrong result
var bytes= Encoding.ASCII.GetBytes(msg);
return Convert.ToBase64String(bytes);
The problem here is the text encoding you're using.
The first Base64 string you posted is encoded using Unicode with a nul terminator byte pair. The trailing 'AAAAA==' is a dead giveaway here. You can see it yourself by examining the byte array:
var originalB64 = "RQA6AFwAUAByAG8AagBlAGMAdABzAFwAWQBvAHUAdAB1AGIAZQAuAE0AYQBuAGEAZwBlAHIAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIAXABvAGIAagBcAFIAZQBsAGUAYQBzAGUAXABuAGUAdABzAHQAYQBuAGQAYQByAGQAMgAuADAAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIALgBkAGwAbAAAAA==";
var bytes = Convert.FromBase64String(originalB64);
Converting this to a string will give you a null-terminated string 125 characters long, with the last character being nul.
Given a path that is not nul-terminated you can reproduce that string as follows:
string path = #"E:\Projects\Youtube.Manager\Youtube.Manager.Models.Container\obj\Release\netstandard2.0\Youtube.Manager.Models.Container.dll";
string newB64 = Convert.ToBase64String(Encoding.Unicode.GetBytes(path + "\0"));
This matches the original Base64 string exactly in my tests.
I have a text file stored locally. I want to store string data in binary format there and then retrieve the data again. In the following code snippet, I have done the conversion.
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
class ConsoleApplication
{
const string fileName = "AppSettings.dat";
static void Main()
{
string someText = "settings";
byte[] byteArray = Encoding.UTF8.GetBytes(someText);
int byteArrayLenght = byteArray.Length;
using (BinaryWriter writer = new BinaryWriter(File.Open(fileName, FileMode.Create)))
{
writer.Write(someText);
}
byte[] x = new byte[byteArrayLenght];
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
x = reader.ReadBytes(byteArrayLenght);
}
string str = Encoding.UTF8.GetString(x);
Console.Write(str);
Console.ReadKey();
}
}
}
In the AppSettings.dat file the bytes are written in the following way
But when I have assigned some random value in a byte array and save it in a file using BinaryWriter as I have done in the following code snippet
const string fileName = "AppSettings.dat";
static void Main()
{
byte[] array = new byte[8];
Random random = new Random();
random.NextBytes(array);
using (BinaryWriter writer = new BinaryWriter(File.Open(fileName, FileMode.Create)))
{
writer.Write(array);
}
}
It's actually saved the data in binary format in the text file, shown in the picture.
I don't understand why (in my first case) the byte data converted from string showing human readable format where I want to save the data in non-readable byte format(later case). What's the explanation regarding this?
Is there any way where I can store string data in binary format without approaching brute force?
FYI - I don't want to keep the data in Base64String format, I want it to be in binary format.
If security isn't a concern, and you just don't want the average usage to find your data while meddling into the settings files, a simple XOR will do:
const string fileName = "AppSettings.dat";
static void Main()
{
string someText = "settings";
byte[] byteArray = Encoding.UTF8.GetBytes(someText);
for (int i = 0; i < byteArray.Length; i++)
{
byteArray[i] ^= 255;
}
File.WriteAllBytes(fileName, byteArray);
if (File.Exists(fileName))
{
var x = File.ReadAllBytes(fileName);
for (int i = 0; i < byteArray.Length; i++)
{
x[i] ^= 255;
}
string str = Encoding.UTF8.GetString(x);
Console.Write(str);
Console.ReadKey();
}
}
It takes advantage of an interesting property of character encoding:
In ASCII, the 0-127 range contains the most used characters (a to z, 0 to 9) and the 128-256 range contains only special symbols and accents
For compatibility reasons, in UTF-8 the 0-127 range contains the same characters as ASCII, and the 128-256 range have a special meaning (it tells the decoder that the characters are encoded into multiple bytes)
All I do is flipping the strong-bit of each byte. Therefore, everything in the 0-127 range ends up in the 128-256 range, and vice-versa. Thanks to the property I described, no matter if the text-reader tries to parse in ASCII or in UTF-8, it will only get gibberish.
Please note that, while it doesn't produce human-readable content, it isn't secure at all. Don't use it to store sensitive data.
The notepad just reads your binary data and converts it to UTF8 text.
This code snippet would give you the same result.
byte[] randomBytes = new byte[20];
Random rand = new Random();
rand.NextBytes(randomBytes);
Console.WriteLine(Encoding.UTF8.GetString(randomBytes));
If you want to stop people from converting your data back to a string. then you need to encrypt your data. Here is a project that can help you with that.
But they are still able to read the data in a text editor because it converts your encrypted data to UFT8. They can't Convert it back to usable data unless they have to key to decrypt your data.
Sometimes the byte array b64 is UTF-8, and other times is UTF-16. I keep reading online that C# strings are always UTF-16, but that is not the case for me here. Why is this happening, and how do I fix it? I have a simple method for converting a base64 string to a normal string:
public static string FromBase64(this string input)
{
String corrected = new string(input.ToCharArray());
byte[] b64 = Convert.FromBase64String(corrected);
if (b64[1] == 0)
{
return System.Text.Encoding.Unicode.GetString(b64);
}
else
{
return System.Text.Encoding.UTF8.GetString(b64);
}
}
The same thing is happening to my base 64 encoder:
public static string ToBase64(this string input)
{
String b64 = Convert.ToBase64String(input.GetBytes());
return b64;
}
public static byte[] GetBytes(this string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
Example:
On my computer, "cABhAHMAcwB3AG8AcgBkADEA" decodes to:
'p','\0','a','\0','s','\0','s','\0','w','\0','o','\0','r','\0','d','\0','1','\0'
But on my coworkers computer it is:
'p','a','s','s','w','o','r','d','1'
Edit:
I know that the string I create comes from a textbox, and that the file where I am saving it to is always going to be UTF-8, so everything is pointing to the Convert method causing my encoding switch.
Update:
After digging in further, it appears that my coworker had a very important line commented in his version of the code, the one that saves the value read from file to the hashtable. The default value I was using is a UTF-8 base64 value, so I am going to correct the default, to a utf-16 value, then I can clean up the code removing any UTF8 references.
Also, I had been naively using the UTF-8 base64 encoding I had retrieved from a website, not realizing what I was getting myself into. The funny part is I would never have found that fact if my coworker hadn't commented the line that saves the values from the file.
Final version of the code:
public static string FromBase64(this string input)
{
byte[] b64 = Convert.FromBase64String(input);
return System.Text.Encoding.Unicode.GetString(b64);
}
public static string ToBase64(this string input)
{
String b64 = Convert.ToBase64String(input.GetBytes());
return b64;
}
public static byte[] GetBytes(this string str)
{
return System.Text.Encoding.Unicode.GetBytes(str);
}
First of all, I want to debunk the title of the question:
Convert.FromBase64String() returns Unicode sometimes, or UTF-8
That is not the case. Give then same input, valid base64 encoded text, Convert.FromBase64String() always returns the same output.
Moving on, you cannot determine definitively, just by examining the payload, the encoding used for a string. You attempt to do this with
if (b64[1] == 0)
// encoding must be UTF-16
This is not the case. The overwhelming majority of UTF-16 character elements fail that test. It does not matter how you try to write this test it is doomed to fail. And that is because there exist byte arrays that are well-defined strings when interpreted as different encodings. In other words it is possible, for instance, to construct byte arrays that are valid when considered as either UTF-8 or UTF-16.
So, you have to know a priori whether the payload is encoded as UTF-16, UTF-8 or indeed some other encoding.
The solution will be to keep track of the original encoding, before the base64 encoding. Pass that information along with the base64 encoded payload. Then when you decode, you can determine which Encoding to use to decode back to a string.
It looks to me very much that your strings are all coming from UTF-16 .net strings. In which case you won't have UTF-8 strings ever, and should always decode with UTF-16. That is you use Encoding.Unicode.GetString().
Also, the GetBytes method in your code is poor. It should be:
public static byte[] GetBytes(this string str)
{
return Encoding.Unicode.GetBytes(str);
}
Another oddity:
String corrected = new string(input.ToCharArray());
This is a no-op.
Finally, it is quite likely that your text will be more compact when encoded as UTF-8. So perhaps you should consider doing that before applying the base64 encoding.
Regarding your update, what you state is incorrect. This code:
string str = Encoding.Unicode.GetString(
Convert.FromBase64String("cABhAHMAcwB3AG8AcgBkADEA"));
assigns password1 to str wherever it is run.
Try revising the code to make it a little more readable/accurate. As mentioned in my comment and David Hefferman's answer you're trying to do things that either:
A) do nothing
or
B) demonstrate flawed logic
The following code based upon yours works fine:
class Program
{
static void Main(string[] args)
{
string original = "password1";
string encoded = original.ToBase64();
string decoded = encoded.FromBase64();
Console.WriteLine("Original: {0}", original);
Console.WriteLine("Encoded: {0}", encoded);
Console.WriteLine("Decoded: {0}", decoded);
}
}
public static class Extensions
{
public static string FromBase64(this string input)
{
return System.Text.Encoding.Unicode.GetString(Convert.FromBase64String(input));
}
public static string ToBase64(this string input)
{
return Convert.ToBase64String(input.GetBytes());
}
public static byte[] GetBytes(this string str)
{
return System.Text.Encoding.Unicode.GetBytes(str);
}
}
What you are doing is no different than encoding data in either EBCDIC or ASCII, then trying to figure out which was used during the decode. As you have already discovered, this is not going to work very well.
The only way to get this to work correctly is to have a single encoding format used by all participants. This is a fundamental concept of communications.
Pick an encoding - let's say UTF-8 - and use it for all transformations between String and byte[]. This will ensure that you have accurate knowledge of the format of the payload and how to deal with it, as David Tanner has been telling you.
Here's the basic form:
public static string ToBase64(this string self)
{
byte[] bytes = Encoding.UTF8.GetBytes(self);
return Convert.ToBase64String(bytes);
}
public static string FromBase64(this string self)
{
byte[] bytes = Convert.FromBase64String(self);
return Encoding.UTF8.GetString(bytes);
}
Regardless of whatever weirdness might be happening between your computers, this code will produce the same encoded strings.
Possible Duplicate Converting byte array to string and back again in C#
I am using Huffman Coding for compression and decompression of some text from here
The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.
For my situation, i need to get the compressed content, store it and decompress it when ever need.
The output from the encoder and the input to the decoder are BitArray.
When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.
Tree huffmanTree = new Tree();
huffmanTree.Build(input);
string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);
// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();
// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);
// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);
// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);
// Convert bytes back to bit array
BitArray todecode = new BitArray(e);
string decoded = huffmanTree.Decode(todecode);
Console.WriteLine("Decoded: " + decoded);
Console.ReadLine();
The Output of Original code from the tutorial is:
The Output of My Code is:
Where am I wrong friends? Help me, Thanks in advance.
You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.
string output = Encoding.UTF8.GetString(e);
e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.
Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.
I'm pretty sure Encoding doesn't roundtrip - that is you can't encode an arbitrary sequence of bytes to a string, and then use the same Encoding to get bytes back and always expect them to be the same.
If you want to be able to roundtrip from your raw bytes to string and back to the same raw bytes, you'd need to use base64 encoding e.g.
http://blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/22/how-to-encoding-and-decoding-base64-strings-in-c.aspx