Possible Duplicate Converting byte array to string and back again in C#
I am using Huffman Coding for compression and decompression of some text from here
The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.
For my situation, i need to get the compressed content, store it and decompress it when ever need.
The output from the encoder and the input to the decoder are BitArray.
When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.
Tree huffmanTree = new Tree();
huffmanTree.Build(input);
string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);
// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();
// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);
// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);
// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);
// Convert bytes back to bit array
BitArray todecode = new BitArray(e);
string decoded = huffmanTree.Decode(todecode);
Console.WriteLine("Decoded: " + decoded);
Console.ReadLine();
The Output of Original code from the tutorial is:
The Output of My Code is:
Where am I wrong friends? Help me, Thanks in advance.
You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.
string output = Encoding.UTF8.GetString(e);
e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.
Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.
I'm pretty sure Encoding doesn't roundtrip - that is you can't encode an arbitrary sequence of bytes to a string, and then use the same Encoding to get bytes back and always expect them to be the same.
If you want to be able to roundtrip from your raw bytes to string and back to the same raw bytes, you'd need to use base64 encoding e.g.
http://blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/22/how-to-encoding-and-decoding-base64-strings-in-c.aspx
Related
Apologies for the abortive first try, particularly to Olivier. Trying again.
Situation is we have a string coming in from a mainframe to a C# app. We understand it needs to be converted to a byte array. However, this data is a mixture of ASCII characters and true binary UINT16 and UINT32 fields, which are not always in the same spot in the data. Later on we will deserialize the data and will know the structure's data alignments, but not at this juncture.
Logic flow briefly is to send a structure with binary embedded, receive a reply with binary embedded, convert string reply to bytes (this is where we have issues), deserialize the bytes based on an embedded structure name, then process the structure. Until we reach deserialize, we don't know where the UINTs are. Bits are bits at this point.
When we have a reply byte which is ultimately part of a UINT16, and that byte has the high-order bit set (making it "extended ascii" or "negative", however you want to say it), that byte is converted to nulls. So any value >= 128 in that byte is lost.
Our code to convert looks like this:
public async Task<byte[]> SendMessage(byte[] sendBytes)
{
byte[] recvbytes = null;
var url = new Uri("http://<snipped>");
WebRequest webRequest = WebRequest.Create(url);
webRequest.Method = "POST";
webRequest.ContentType = "application/octet-stream";
webRequest.Timeout = 10000;
using (Stream postStream = await webRequest.GetRequestStreamAsync().ConfigureAwait(false))
{
await postStream.WriteAsync(sendBytes, 0, sendBytes.Length);
await postStream.FlushAsync();
}
try
{
string Response;
int Res_lenght;
using (var response = (HttpWebResponse)await webRequest.GetResponseAsync())
using (Stream streamResponse = response.GetResponseStream())
using (StreamReader streamReader = new StreamReader(streamResponse))
{
Response = await streamReader.ReadToEndAsync();
Res_lenght = Response.Length;
}
if (string.IsNullOrEmpty(Response))
{
recvbytes = null;
}
else
{
recvbytes = ConvertToBytes(Response);
var table = (Encoding.Default.GetString(
recvbytes,
0,
recvbytes.Length - 1)).Split(new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None);
}
}
catch (WebException e)
{
//error
}
return recvbytes;
}
static byte[] ConvertToBytes(string inputString)
{
byte[] outputBytes = new byte[inputString.Length * sizeof(byte)];
String strLocalDate = DateTime.Now.ToString("hh.mm.ss.ffffff");
String fileName = "c:\\deleteMe\\Test" + strLocalDate;
fileName = fileName + ".txt";
StreamWriter writer = new StreamWriter(fileName, true);
for (int i=0;i<inputString.Length;i++) {
try
{
outputBytes[i] = Convert.ToByte(inputString[i]);
writer.Write("String in: {0} \t Byte out: {1} \t Index: {2} \n", inputString.Substring(i, 2), outputBytes[i], i);
}
catch (Exception ex)
{
//error
}
}
writer.Flush();
return outputBytes;
}
ConvertToBytes has a line in the FOR loop to display the values in and out, plus the index value. Here is one of several spots where we see the conversion error - note indexes 698 and 699 represent a UINT16:
String in: sp Byte out: 32 Index: 696 << sp = space
String in: sp Byte out: 32 Index: 697
String in: \0 Byte out: 0 Index: 698
String in: 2 Byte out: 50 Index: 700 << where is 699?
String in: 0 Byte out: 48 Index: 701
String in: 1 Byte out: 49 Index: 702
String in: 6 Byte out: 54 Index: 703
The expected value for index 699 is decimal 156, which is binary 10011100. The high order bit is on. So the conversion for #698 is correct, and for #700, which is an ascii 2 is correct, but not for #699. Given the UINT16 (0/156) is a component of the key to subsequent records, seeing 0/0 for the values is a show-stopper. We don't have a displacement error for 699, we see nulls in the deserialize. No idea why the .Write didn't report it.
Another example, such as 2/210 (decimal 722 when seen as a full UINT16) come out as 2/0 (decimal 512).
Please understand this code as shown above works for everything except the 8-bit reply string fields which have the high-order bit set.
Any suggestions how to convert a string element to byte regardless of the content of the string element would be appreciated. Thanks!
Without a good Minimal, Complete, and Verifiable example that reliably reproduces the problem, it's impossible to state specifically what is wrong. But given what you've posted, some useful observations can be made:
First and foremost, as far as "where is 699?" goes, it's obvious that an exception is being thrown. That's how the WriteLine() call would be skipped and result in no output for that index. You have a couple of opportunities in the code you posted for that to happen: the call to Convert.ToByte(), or the following statement (particularly the call to inputString.Substring()).
Unfortunately, without a good MCVE it's hard to understand why you are printing a two-character substring from the input string, or why you say the characters "sp" become the character value 0x20 (i.e. a space character). The output you describe in the question doesn't appear to be self-consistent. But, let's move on…
Assuming for the moment that at least in the specific case you're looking at, there are enough characters in inputString at that point for the call to Substring() to succeed, we're left with the conclusion that the call to Convert.ToByte() is failing.
Given what you wrote, it seems that the main issue here is a misunderstanding on your part about how text is encoded and manipulated in a C# program. In particular, a C# character is in some sense an abstraction and doesn't have an encoding at all. To the extent that you force the encoding to be revealed, i.e. by casting or otherwise converting the raw character value directly, that value is always encoded as UTF16.
Put another way: you are dealing with a C# string object, made of C# char values. I.e. by the time you get this text into your program and call the ConvertToBytes() method, it's already been converted to UFT16, regardless of the encoding used by the sender.
In UTF16, character values that would be greater than 127 (0x7f) in an "extended ASCII" encoding (e.g. any of the various ANSI/OEM/ISO single-byte encodings) are not encoded as their original value. Instead, they will have a 16-bit value greater than 255.
When you ask Convert.ToByte() to convert such a value to a byte, it will throw an exception, because the value is larger than the largest value that can fit in a byte.
It is fairly clear why the code you posted is producing the results you describe (at least, to some extent). But it is not clear at all what you are actually hoping to accomplish here. I can say that attempting to convert char values to/from byte values by straight casting is simply not going to work. The char type isn't a byte, it's two bytes and any non-ASCII characters will use larger values than can fit in a byte. You should be using one of the several .NET classes that actually will do text encoding, such as the Encoding.GetBytes() method.
Of course, to do that you'll have to make sure you first understand precisely why you are trying to convert to bytes and what encoding you want to use. The code you posted seems to be trying to interpret your encoded bytes as the current Encoding.Default encoding, so you should use that encoding to encode the text. But there's not really any value in encoding to that encoding only to decode back to a C# string value. Assuming you've done it correctly, all that will happen is you'll get exactly the same string you started with.
In other words, while I can explain the behavior you're seeing to the extent that you've described it here, that's unlikely to address whatever broader problem you are actually trying to solve. If the above does not get you back on track, please post a new question in which you've included a good MCVE and a clear explanation of what that broader problem you're trying to solve actually us.
I have the following problem: if the String contains a char that is not known from ASCII, it uses a 63.
Because of that i changed the encoding to UTF8, but I know a char can have the length of two bytes, so I get a out of range error.
How can I solve the problem?
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
byte[] baInput = enc.GetBytes(strInput);
// Split byte array (6 Byte) in date (days) and time (ms) parts
byte[] baMsec = new byte[4];
byte[] baDays = new byte[2];
for (int i = 0; i < baInput.Length; i++)
{
if (4 > i)
{
baMsec[i] = baInput[i];
}
else
{
baDays[i - 4] = baInput[i];
}
}
The problem you seem to be having is that you know the number of characters, but not the number of bytes, when using UTF8. To solve just that problem, you could use:
byte[] baMsec = Encoding.UTF8.GetBytes(strInput.SubString(0, 4));
byte[] baDays = Encoding.UTF8.GetBytes(strInput.SubString(4));
Recommended Solution:
1) Split the strInput using the SubString(Int32, Int32) method and get the date and time parts in separate String variables, say strDate and strTime.
2) Then call UTF8Encoding.GetBytes on strDate and strTime and collect the byte array in baDays and baMsec respectively.
Why this works:
C# String is by default UTF-16 encoded, which is equally good to represent non-ASCII characters. Hence, no data is lost.
General Caution:
Never try to directly manipulate encoded strings at byte-level, you'll get lost. Use the String and Encoding class methods of C# to get the bytes if you want bytes.
Alternate approach:
I'm wondering (like others) why your date-time data contains non-numeric characters. I saw in a comment that you get your data from reader["TIMESTAMP2"].ToString(); and the sample content is §║ ê or l¦h. Check if you are interpreting numeric data stored in reader["TIMESTAMP2"] as String by mistake and should you actually treat it as a numeric type. Otherwise, even with this method, you'll be getting unexpected output soon.
The problem is that your baInput can contain more values than both baDays and baMsec can contain. After 6 iterations, you run out of the array size. Hence, the exception.
When you hit the seventh iteration, you get i - 4 which yields 6 - 4 = 2.
Since baDays only has two items, you can set the values on index 0 and 1.
I've handled base64 encoded images and strings and have been able to decode them using C# in the past.
I'm now trying on what looks to me like a base64 string, but the value I'm getting is about 98% accurate and I just don't understand what is affecting the output.
Here is the string:
http://pastebin.com/ntcth6uN
And this is the decoded value:
http://pastebin.com/Buh4xXDA
That IS what it should be, but you can clearly see where there are artifacts and the decoded value isn't quite right.
Any idea why it's failing?
var data = Convert.FromBase64String(Faces[i].InfoData);
Faces[i].InfoData = Encoding.UTF8.GetString(data);
Thanks for your help.
The string is not encoded as UTF8, but instead as another encoding. Thus the same encoding must be used to decode it.
Use the following to decode it:
Encoding.ASCII.GetString(data);
If ASCII isn't the correct encoding, here's some code that will iterate through available encodings and list the first 200 characters in order to manually select original encoding:
String encStr = "PFdhdGNoIG5hbWU9IlBpa2FjaHUDV2F0Y2DDQ2hyb25vIiBkZXNjcmlwdGlvbj0iIiBhdXRob3I9IkFsY2hlbWlzdFByaW1lIiB3ZWJfbGluaz0iaHR0cgov42ZhY2VyZXBv4mNvbS9hcHAvZmFjZXMvdXNlci9BbGNoZW1pc3RQcmltZS8xIiBiZ19jb2xvcj0iYWJhYmFiIiBpbmRfbG9jPSJ0YyIDaW5kX2JnPSJZIiBob3R3b3JkX2xvYz0idGMiIGhvdHdvcmRfYmc9IlkiPDoDICADPExheWVyIHR5cGU9ImltYWdlIiBLPSIwIiB5PSIwIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDcGF0ag0i4mltZzQ5OTIucHBuZyIDd2lkdGD9IjU1MCIDaGVpZ2h0PSI1NTAiIGNvbG9yPSJmZmZmZmYiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMTciIHk9IjQ5IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie2RofTp7ZG16fSIDdGVLdF9zaXplPSIzMiIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iOTAiIHk9IjU0IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie2Rzen0iIHRleHRfc2l6ZT0iMTDiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9IjEyNCIDeT0iNTQiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMTAwIG9yIgAiIGFsaWdubWVudg0iY2MiIHRleHQ9IntkYX0iIHRleHRfc2l6ZT0iMjAiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9Ii0LOSIDeT0iNTIiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJ7Ymx9IiB0ZXh0X3NpemU9IjE5IiBmb250PSJMQ0RQSE9ORSIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09IjFkMWQxZCIDY29sb3I9IjFkMWQxZCIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJzaGFwZSIDeg0iMSIDeT0iNgDiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMCBvciAxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJTcXVhcmUiIHdpZHRoPSIyNzDiIGhlaWdodg0iNgUiIGNvbG9yPSJhYmFiYWIiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMCIDeT0iNgDiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMCBvciAxMgAiIGFsaWdubWVudg0iY2MiIHRleHQ9Intzd219Ontzd3N9Ontzd3Nzc30iIHRleHRfc2l6ZT0iMzYiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9Ii0xMTDiIHk9Ijk2IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie3N3cn0DYW5kICZhcG9zO1NUT1AmYXBvczsDb3IDJmFwb3M7U1RBUlQmYXBvczsiIHRleHRfc2l6ZT0iMTUiIGZvbnQ9IlJvYm90by1SZWd1bGFyIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIDdGFwX2FjdGlvbj0ic3dfc3RhcnRfc3RvcCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9IjExOCIDeT0iOTYiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJSRVNFVCIDdGVLdF9zaXplPSIxNSIDZm9udg0iUm9ib3Rv4VJlZ3VsYXIiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIiB0YXBfYWN0aW9uPSJzd19yZXNldCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSItMTI0IiB5PSIxNgEiIGd5cm89IjAiIHJvdGF0aW9uPSItOTAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJUcmlhbmdsZSIDd2lkdGD9IjM1IiBoZWlnaHQ9IjM1IiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSItMTE3IiB5PSIxNgQiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMCIDYWxpZ25tZW50PSJjYyIDc2hhcGU9IlNxdWFyZSIDd2lkdGD9IjEyMCIDaGVpZ2h0PSIxMjAiIGNvbG9yPSIyOWI5ZgMiIGRpc3BsYXk9ImJkIiB0YXBfYWN0aW9uPSJzd19zdGFydF9zdG9wIi8+CiADICA8TGF5ZXIDdHlwZT0ic2hhcGUiIHD9IjEyNCIDeT0iMTQxIiBneXJvPSIwIiByb3RhdGlvbj0iOTAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJUcmlhbmdsZSIDd2lkdGD9IjM1IiBoZWlnaHQ9IjM1IiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSIxMTciIHk9IjE0NCIDZ3lybz0iMCIDcm90YXRpb2L9IjAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIwIiBhbGlnbm1lbnQ9ImNjIiBzaGFwZT0iU3F1YXJlIiB3aWR0ag0iMTIwIiBoZWlnaHQ9IjEyMCIDY29sb3I9IjI5YjlkMyIDZGlzcGxheT0iYmQiIHRhcF9hY3Rpb2L9InN3X3Jlc2V0Ii8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0i4TDyIiB5PSItNgMiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJ7d3RkfSIDdGVLdF9zaXplPSIyNSIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0iaW1hZ2VfY29uZCIDeg0i4TD5IiB5PSItOTIiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiBwYXRoPSJ3ZWF0aGVyX3NldF8z4nBwbmciIHdpZHRoPSI3MCIDaGVpZ2h0PSI3MCIDY29sb3I9IjQ1NgU0NSIDY29uZF92YWx1ZT0iJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MgFkJmFwb3M7IGFuZCAxIG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzAyZCZhcG9zOyBhbmQDMiBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczswM2QmYXBvczsDYW5kIgMDb3IDJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MgRkJmFwb3M7IGFuZCA0IG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzA5ZCZhcG9zOyBhbmQDNSBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczsxMGQmYXBvczsDYW5kIgYDb3IDJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MTFkJmFwb3M7IGFuZCA3IG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzEzZCZhcG9zOyBhbmQDOCBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczs1MGQmYXBvczsDYW5kIgkDb3IDMSIDY29uZF9ncmlkPSIzegMiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMTEzIiB5PSIzIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0iTW9vbiIDdGVLdF9zaXplPSIxOCIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0ic2hhcGUiIHD9IjExNCIDeT0i4TUyIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDc2hhcGU9IkNpcmNsZSIDd2lkdGD9IjYwIiBoZWlnaHQ9IjYwIiBjb2xvcj0iNgU0NTQ1IiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9ImltYWdlX2NvbmQiIHD9IjExNCIDeT0i4TUyIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDcGF0ag0ibW9vbl9zZXRfMy5wcG5nIiB3aWR0ag0iNjAiIGhlaWdodg0iNjAiIGNvbG9yPSJhYmFiYWIiIGNvbmRfdmFsdWU9Int3bXB9IiBjb25kX2dyaWQ9IjNLMyIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJ0ZXh0IiBLPSIwIiB5PSIwIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0iIiB0ZXh0X3NpemU9IjQwIiBmb250PSJSb2JvdG8tUmVndWxhciIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09ImZmZmZmZiIDY29sb3I9ImZmZmZmZiIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJ0ZXh0IiBLPSIxMCIDeT0i4TExNiIDZ3lybz0iMCIDcm90YXRpb2L9IjAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHRleHQ9IntkZHd9IHtkbm5ufSB7ZGR9IiB0ZXh0X3NpemU9IjE2IiBmb250PSJMQ0RQSE9ORSIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09IjFkMWQxZCIDY29sb3I9IjFkMWQxZCIDZGlzcGxheT0iYmQi4zLKPC9XYXRjagLK";
var data = Convert.FromBase64String(encStr);
// Iterate over all encodings, and decode
foreach (EncodingInfo ei in Encoding.GetEncodings())
{
Encoding e = ei.GetEncoding();
Console.Write("{0,-15} - {1}{2}", ei.CodePage, e.EncodingName, System.Environment.NewLine);
Console.WriteLine(e.GetString(data).Substring(0, 200));
Console.Write(System.Environment.NewLine + "---------------------" + System.Environment.NewLine);
}
Fiddle: https://dotnetfiddle.net/LcV5s8
I have a uint value that I need to represent as a ByteArray and the convert in a string.
When I convert back the string to a byte array I found different values.
I'm using standard ASCII converter so I don't understand why I'm getting different values.
To be more clear this is what I'm doing:
byte[] bArray = BitConverter.GetBytes((uint)49694);
string test = System.Text.Encoding.ASCII.GetString(bArray);
byte[] result = Encoding.ASCII.GetBytes(test);
The bytearray result is different from the first one:
bArray ->
[0x00000000]: 0x1e
[0x00000001]: 0xc2
[0x00000002]: 0x00
[0x00000003]: 0x00
result ->
[0x00000000]: 0x1e
[0x00000001]: 0x3f
[0x00000002]: 0x00
[0x00000003]: 0x00
Notice that the byte 1 is different in the two arrays.
Thanks for your support.
Regards
string test = System.Text.Encoding.ASCII.GetString(bArray);
byte[] result = Encoding.ASCII.GetBytes(test);
Because raw data is not ASCII. Encoding.GetString is only meaningful if the data you are decoding is text data in that encoding. Anything else: you corrupt it. If you want to store a byte[] as a string, then base-n is necessary - typically base-64 because a: it is conveniently available (Convert.{To|From}Base64String), and b: you can fit it into ASCII, so you rarely hit code-page / encoding issues. For example:
byte[] bArray = BitConverter.GetBytes((uint)49694);
string test = Convert.ToBase64String(bArray); // "HsIAAA=="
byte[] result = Convert.FromBase64String(test);
Because c2 is not a valid ASCII char and it is replaced with '?'(3f)
Converting any byte array to string using SomeEncoding.GetString() is not a safe method as #activwerx suggested in comments. Instead use Convert.FromBase64String, Convert.ToBase64String
I've written my first COM classes. My unit tests work fine, but my first use of the COM objects has hit a snag.
The COM classes provide methods which accept a string, manipulate it and return a string. The consumer of the COM objects is a dBASE PLUS program.
When the input string contains common keyboard characters (ASCII 127 or lower), the COM methods work fine. However, if the string contains characters beyond the ASCII range, some of them get remapped from Windows-1252 to C#'s Unicode. This table shows the mapping that takes place: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
For example, if the dBASE program calls the COM object with:
oMyComObject.MyMethod("It will cost€123") where the € is hex 80,
the C# method receives it as Unicode:
public string MyMethod(string source)
{
// source is Unicode and now the Euro symbol is hex 20AC
...
}
I would like to avoid this remapping because I want the original hex content of the string.
I've tried adding the following to MyMethod to convert the string back to Windows-1252, but the Euro symbol gets lost because it becomes a question mark:
byte[] UnicodeBytes = Encoding.Unicode.GetBytes(source.ToString());
byte[] Win1252Bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), UnicodeBytes);
string Win1252 = Encoding.GetEncoding(1252).GetString(Win1252Bytes);
Is there a way to prevent this conversion of the "source" parameter to Unicode? Or, is there a way to convert it 100% from Unicode back to Windows-1252?
Yes, I'm answering my own question. The answer by "Jigsore" put me on the right track, but I want to explain more clearly in case someone else makes the same mistake I made.
I eventually figured out that I had misdiagnosed the problem. dBASE was passing the string fine and C# was receiving it fine. It was how I checked the contents of the string that was in error.
This turnkey builds on Jigsore's answer:
void Main()
{
string unicodeText = "\u20AC\u0160\u0152\u0161";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeText);
byte[] win1252bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), unicodeBytes);
for (int i = 0; i < win1252bytes.Length; i++)
Console.Write("0x{0:X2} ", win1252bytes[i]); // output: 0x80 0x8A 0x8C 0x9A
// win1252String represents the string passed from dBASE to C#
string win1252String = Encoding.GetEncoding(1252).GetString(win1252bytes);
Console.WriteLine("\r\nWin1252 string is " + win1252String); // output: Win1252 string is €ŠŒš
Console.WriteLine("looking at the code of the first character the wrong way: " + (int)win1252String[0]);
// output: looking at the code of the first character the wrong way: 8364
byte[] bytes = Encoding.GetEncoding(1252).GetBytes(win1252String[0].ToString());
Console.WriteLine("looking at the code of the first character the right way: " + bytes[0]);
// output: looking at the code of the first character the right way: 128
// Warning: If your input contains character codes which are large in value than what a byte
// can hold (ex: multi-byte Chinese characters), then you will need to look at more than just bytes[0].
}
The reason the first method was wrong is that casting (int)win1252String[0] (or the converse of casting an integer j to a character with (char)j) involves an implicit conversion with the Unicode character set C# uses.
I consider this resolved and would like to thank each person who took the time to comment or answer for their time and trouble. It is appreciated!
Actually you're doing the Unicode to Win-1252 conversion correctly, but you're performing an extra step. The original Win1252 codes are in the Win1252Bytes array!
Check the following code:
string unicodeText = "\u20AC\u0160\u0152\u0161";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeText);
byte[] win1252bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), unicodeBytes);
for (i = 0; i < win1252bytes.Length; i++)
Console.Write("0x{0:X2} ", win1252bytes[i]);
The output shows the Win-1252 codes for the unicodeText string, you can check this by looking at the CP1252.TXT table.