See subject, note that this question only applies to the .NET compact framework. This happens on the emulators that ship with Windows Mobile 6 Professional SDK as well as on my English HTC Touch Pro (all .NET CF 3.5). iso-8859-1 stands for Western European (ISO), which is probably the most important encoding besides us-ascii (at least when one goes by the number of usenet posts).
I'm having a hard time to understand why this encoding is not supported, while the following ones are supported (again on both the emulators & my HTC):
iso-8859-2 (Central European (ISO))
iso-8859-3 (Latin 3 (ISO))
iso-8859-4 (Baltic (ISO))
iso-8859-5 (Cyrillic (ISO))
iso-8859-7 (Greek (ISO))
So, is support for say Greek more important than support for German, French
and Spanish? Can anyone shed some light on this?
Thanks!
Andreas
I would try to use "windows-1252" as encoding string. According to Wikipedia, Windows-1252 is a superset of ISO-8859-1.
System.Text.Encoding.GetEncoding(1252)
This MSDN article says:
The .NET Compact Framework supports
character encoding on all devices:
Unicode (BE and LE), UTF8, UTF7, and
ASCII.
There is limited support for code page
encoding and only if the encoding is
recognized by the operating system of
the device.
The .NET Compact Framework throws a
PlatformNotSupportedException if the a
required encoding is not available on
the device.
I believe all (or at least many) of the ISO encodings are code-page encodings and fall under the "limited support" rule. UTF8 is probably your best bet as a replacement.
I know its a bit later but I made an implementation for .net cf of encoding ISO-8859-1, I hope this could help:
namespace System.Text
{
public class Latin1Encoding : Encoding
{
private readonly string m_specialCharset = (char) 0xA0 + #"¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ";
public override string WebName
{
get { return #"ISO-8859-1"; }
}
public override int CodePage
{
get { return 28591; }
}
public override int GetByteCount(char[] chars, int index, int count)
{
return count;
}
public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
{
if (chars == null)
throw new ArgumentNullException(#"chars", #"null array");
if (bytes == null)
throw new ArgumentNullException(#"bytes", #"null array");
if (charIndex < 0)
throw new ArgumentOutOfRangeException(#"charIndex");
if (charCount < 0)
throw new ArgumentOutOfRangeException(#"charCount");
if (chars.Length - charIndex < charCount)
throw new ArgumentOutOfRangeException(#"chars");
if (byteIndex < 0 || byteIndex > bytes.Length)
throw new ArgumentOutOfRangeException(#"byteIndex");
for (int i = 0; i < charCount; i++)
{
char ch = chars[charIndex + i];
int chVal = ch;
bytes[byteIndex + i] = chVal < 160 ? (byte)ch : (chVal <= byte.MaxValue ? (byte)m_specialCharset[chVal - 160] : (byte)63);
}
return charCount;
}
public override int GetCharCount(byte[] bytes, int index, int count)
{
return count;
}
public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
{
if (chars == null)
throw new ArgumentNullException(#"chars", #"null array");
if (bytes == null)
throw new ArgumentNullException(#"bytes", #"null array");
if (byteIndex < 0)
throw new ArgumentOutOfRangeException(#"byteIndex");
if (byteCount < 0)
throw new ArgumentOutOfRangeException(#"byteCount");
if (bytes.Length - byteIndex < byteCount)
throw new ArgumentOutOfRangeException(#"bytes");
if (charIndex < 0 || charIndex > chars.Length)
throw new ArgumentOutOfRangeException(#"charIndex");
for (int i = 0; i < byteCount; ++i)
{
byte b = bytes[byteIndex + i];
chars[charIndex + i] = b < 160 ? (char)b : m_specialCharset[b - 160];
}
return byteCount;
}
public override int GetMaxByteCount(int charCount)
{
return charCount;
}
public override int GetMaxCharCount(int byteCount)
{
return byteCount;
}
}
}
It is odd that 8859-1 isn't supported, but that said, UTF-8 does have the ability to represent all of teh 8859-1 characters (and more), so is there a reason you can't just use UTF-8 instead? That's what we do internally, and I just dealt with almost this same issue today. The plus side of using UTF-8 is that you get support for far-east and cyrillic languages without making modifications and without adding weight to the western languages.
If anyone is getting an exception(.NET compact framework) like:
System.Text.Encoding.GetEncoding(“iso-8859-1”) throws PlatformNotSupportedException,
//Please follow the steps:
byte[] bytes=Encoding.Default.GetBytes(yourText.ToString);
//The code is:
FileInfo fileInfo = new FileInfo(FullfileName); //file type : *.text,*.xml
string yourText = (char) 0xA0 + #"¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ";
using (FileStream s = fileInfo.OpenWrite()) {
s.Write(Encoding.Default.GetBytes(yourText.ToString()), 0, yourText.Length);
}
Related
I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
string result = System.Text.Encoding.UTF8.GetString(byteArray);
There're at least four different ways doing this conversion.
Encoding's GetString, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String You can easily convert the output string back to byte array by using Convert.FromBase64String. Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncodeYou can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
I saw some answers at this post and it's possible to be considered completed base knowledge, because I have a several approaches in C# Programming to resolve the same problem. The only thing that is necessary to be considered is about a difference between pure UTF-8 and UTF-8 with a BOM.
Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.
Finally, my C# Programming for both UTF-8 encoding types (with BOM and pure) needed to be similar to this example below:
// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);
Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}
There is also class UnicodeEncoding, quite simple in usage:
ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);
Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));
In addition to the selected answer, if you're using .NET 3.5 or .NET 3.5 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:
string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
Alternatively:
var byteStr = Convert.ToBase64String(bytes);
The BitConverter class can be used to convert a byte[] to string.
var convertedString = BitConverter.ToString(byteAttay);
Documentation of BitConverter class can be fount on MSDN.
To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:
// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
int strlen = 0;
while
(
(startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
&& buffer[startIndex + strlen] != 0 // The typical null terimation check
)
{
++strlen;
}
return strlen;
}
// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
strlen = StringLength(buffer, startIndex);
byte[] c_str = new byte[strlen];
Array.Copy(buffer, startIndex, c_str, 0, strlen);
return Encoding.UTF8.GetString(c_str);
}
The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case
A LINQ one-liner for converting a byte array byteArrFilename read from a file to a pure ASCII C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.
String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
.Select(x => x < 128 ? (Char)x : '?').ToArray());
I use '?' as the default character for anything not pure ASCII here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.
Try this console application:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Hello, World!" };
Console.WriteLine("Main String: " + _mainString);
// Convert a string to UTF-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
// Convert UTF-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
Here is a result where you didn’t have to bother with encoding. I used it in my network class and send binary objects as string with it.
public static byte[] String2ByteArray(string str)
{
char[] chars = str.ToArray();
byte[] bytes = new byte[chars.Length * 2];
for (int i = 0; i < chars.Length; i++)
Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);
return bytes;
}
public static string ByteArray2String(byte[] bytes)
{
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; i++)
chars[i] = BitConverter.ToChar(bytes, i * 2);
return new string(chars);
}
string result = ASCIIEncoding.UTF8.GetString(byteArray);
I am trying to achieve the best possible compression for data that consists of just 1s and 0s in a matrix.
To demonstrate what I mean, here's a sample 6 by 6 matrix:
1,0,0,1,1,1
0,1,0,1,1,1
1,0,0,1,0,0
0,1,1,0,1,1
1,0,0,0,0,1
0,1,0,1,0,1
I'd like to compress that into an as small string or byte array as possible. The matrices I will need to compress are bigger though (always 4096 by 4096 1s and 0s).
I suppose it could be compressed quite heavily, but I'm not sure how. I'll mark the best compression as the answer. Performance does not matter.
I assume that you want to compress string into other strings even though your data really is binary. I don't know what the best compression algorithm is (and that will vary depending on your data) but you can convert the input text into bits, compress these and then convert the compressed bytes into a string again using base-64 encoding. This will allow you to go from string to string and still apply a compression algorithm of your choice.
The .NET framework provides the class DeflateStream that will allow you to compress a stream of bytes. The first step is to create a custom Stream that will allow you to read and write your text format. For lack of better name I have named it TextStream. Note that to simplify matters a bit I use \n as the line ending (instead of \r\n).
class TextStream : Stream {
readonly String text;
readonly Int32 bitsPerLine;
readonly StringBuilder buffer;
Int32 textPosition;
// Initialize a readable stream.
public TextStream(String text) {
if (text == null)
throw new ArgumentNullException("text");
this.text = text;
}
// Initialize a writeable stream.
public TextStream(Int32 bitsPerLine) {
if (bitsPerLine <= 0)
throw new ArgumentException();
this.bitsPerLine = bitsPerLine;
this.buffer = new StringBuilder();
}
public override Boolean CanRead { get { return this.text != null; } }
public override Boolean CanWrite { get { return this.buffer != null; } }
public override Boolean CanSeek { get { return false; } }
public override Int64 Length { get { throw new InvalidOperationException(); } }
public override Int64 Position {
get { throw new InvalidOperationException(); }
set { throw new InvalidOperationException(); }
}
public override void Flush() {
}
public override Int32 Read(Byte[] buffer, Int32 offset, Int32 count) {
// TODO: Validate buffer, offset and count.
if (!CanRead)
throw new InvalidOperationException();
var byteCount = 0;
Byte currentByte = 0;
var bitCount = 0;
for (; byteCount < count && this.textPosition < this.text.Length; this.textPosition += 1) {
if (text[this.textPosition] != '0' && text[this.textPosition] != '1')
continue;
currentByte = (Byte) ((currentByte << 1) | (this.text[this.textPosition] == '0' ? 0 : 1));
bitCount += 1;
if (bitCount == 8) {
buffer[offset + byteCount] = currentByte;
byteCount += 1;
currentByte = 0;
bitCount = 0;
}
}
if (bitCount > 0) {
buffer[offset + byteCount] = currentByte;
byteCount += 1;
}
return byteCount;
}
public override void Write(Byte[] buffer, Int32 offset, Int32 count) {
// TODO: Validate buffer, offset and count.
if (!CanWrite)
throw new InvalidOperationException();
for (var i = 0; i < count; ++i) {
var currentByte = buffer[offset + i];
for (var mask = 0x80; mask > 0; mask /= 2) {
if (this.buffer.Length > 0) {
if ((this.buffer.Length + 1)%(2*this.bitsPerLine) == 0)
this.buffer.Append('\n');
else
this.buffer.Append(',');
}
this.buffer.Append((currentByte & mask) == 0 ? '0' : '1');
}
}
}
public override String ToString() {
if (this.text != null)
return this.text;
else
return this.buffer.ToString();
}
public override Int64 Seek(Int64 offset, SeekOrigin origin) {
throw new InvalidOperationException();
}
public override void SetLength(Int64 length) {
throw new InvalidOperationException();
}
}
Then you can write methods for compressing and decompressing using DeflateStream. Note that the the uncompressed input is a string like the one you have provided in your question an the compressed output is a base-64 encoded string.
String Compress(String text) {
using (var inputStream = new TextStream(text))
using (var outputStream = new MemoryStream()) {
using (var compressedStream = new DeflateStream(outputStream, CompressionMode.Compress))
inputStream.CopyTo(compressedStream);
return Convert.ToBase64String(outputStream.ToArray());
}
}
String Decompress(String compressedText, Int32 bitsPerLine) {
var bytes = Convert.FromBase64String(compressedText);
using (var inputStream = new MemoryStream(bytes))
using (var outputStream = new TextStream(bitsPerLine)) {
using (var compressedStream = new DeflateStream(inputStream, CompressionMode.Decompress))
compressedStream.CopyTo(outputStream);
return outputStream.ToString();
}
}
To test it I used a method to create a random string (using a fixed seed to always create the same string):
String CreateRandomString(Int32 width, Int32 height) {
var random = new Random(0);
var stringBuilder = new StringBuilder();
for (var i = 0; i < width; ++i) {
for (var j = 0; j < height; ++j) {
if (i > 0 && j == 0)
stringBuilder.Append('\n');
else if (j > 0)
stringBuilder.Append(',');
stringBuilder.Append(random.Next(2) == 0 ? '0' : '1');
}
}
return stringBuilder.ToString();
}
Creating a random 4,096 x 4,096 string has an uncompressed size of 33,554,431 characters. This is compressed to 2,797,056 characters which is a reduction to about 8% of the original size.
Skipping the base-64 encoding would increase the compression ratio even more but the output would be binary and not a string. If you also consider the input as binary you actually get the following result for random data with equal probability of 0 and 1:
Input bytes: 4,096 x 4,096 / 8 = 2,097,152
Output bytes: 2,097,792
Size after compression: 100%
Simply converting to bytes is a better than doing that following by a deflate. However, using random input but with 25% 0 and 75% 1 you get this result:
Input bytes: 4,096 x 4,096 / 8 = 2,097,152
Output bytes: 1,757,846
Size after compression: 84%
How much deflate will compress your data really depends of the nature of the data. If it is completely random you wont be able to get much compression after converting from text to bytes.
Hmm... as small as possible is not really possible without knowing the problem domain.
Here's the general approach:
Represent the ones and zeros in the array using bits not bytes or characters or whatever.
Compress using a general purpose loss-less compression algorithm. The two most common are:
Huffman encoding and some type of LZW.
Huffman can be mathematically proven to provide the best possible compression of data, the catch is in order to decompress the data you also need the Huffman tree which may be as big as the original data. LZW gives you compression equivalent to Huffman (within a few percent) for most inputs, but performs best on data with repeating segments such as text.
Implementations for the compression algorithms should be easy to come by (GZIP uses LZ77 which is an earlier slightly less optimal version of LZW.)
A good implementation of compression algorithms using modern algorithms go to 7zip.org. It's open source and they have a C API with a DLL, but you'll have to create the .Net interface (unless someone already made one.)
The non general approach:
This relays on a known characteristic of the data. For example: if you know most of the data is zeroes you can encode only the coordinates of the ones.
If the data contains patches of ones and zeros they can be encoded with RLE or two dimensional variants of the algorithm.
Trying to create your own algorithm for specifically compressing this data will most likely not yield much.
Create a GZipStream with Max CompressionLevel
Run a 4096x4096 loop
- set all 64 bits of a ulong to bits of the array
- when 64 bits are done write the ulong to the compressionstream and start at the first bit again
This will very easily add your cube into a pretty compressed block of memory
Using Huffman Coding you can compress it quite much:
0 => 111
1 => 10
, => 0
\r => 1100
\n => 1101
Yields for you example matrix (in bits):
10011101 11010010 01011001 10111101 00111010 01001011 00110110 01110111
01001110 11111001 10111101 00100111 01001011 00110110 01110111 01110111
01011001 10111101 00111010 0111010
If the commas, line feed and carriage return can be excluded then you only need a BitArray to store each value. Although now you need to know the dimension of the matrix when decoding. If you don't then you could store it as an int and then the data itself if you're planning on serializing the data.
Something like:
var input = #"1,0,0,1,1,1
0,1,0,1,1,1
1,0,0,1,0,0
0,1,1,0,1,1
1,0,0,0,0,1
0,1,0,1,0,1";
var values = new List<bool>();
foreach(var c in input)
{
if (c == '0')
values.Add(false);
else if (c == '1')
values.Add(true);
}
var ba = new BitArray(values.ToArray());
then serialize the BitArray. You'd probably need to add the number of padding bits to properly decode the data. (4096 * 4096 is divisible by 8).
The BitArray approach should get you the most compression unless there is a significant amount of repeating patterns in the matrix (yes I'm assuming the data is mostly random).
I need to convert a string to 7-bit ASCII with even parity in a C# application which communicates with a mainframe. I tried using Encoding.ASCII, however, it does not have the correct parity.
Perhaps something like this:
public static byte[] StringTo7bitAsciiEvenParity(string text)
{
byte[] bytes = Encoding.ASCII.GetBytes(text);
for(int i = 0; i < bytes.Length; i++)
{
if(((((bytes[i] * 0x0101010101010101UL) & 0x8040201008040201UL) % 0x1FF) & 1) == 0)
{
bytes[i] &= 0x7F;
}
else
{
bytes[i] |= 0x80;
}
}
return bytes;
}
Completely untested. Don't ask me how the magic to compute parity works. I just found it here. But some variant of this should do what you want.
I tried to make this work for the case ',' becoming 0x82. But I can't work out how that is supposed to be done. ',' is, in ASCII, binary 00101100 and 0x82 is binary 10000010. I don't see the correlation at all.
You will have to calculate the parity bit yourself. So:
convert string into byte array using built in 8bit ASCII encoding. Most top bit should allways be 0 using this encoding.
do the bit modifications on each byte and set parity bit for each byte.
There is no built-in functionality for calculating parity bit, that I know of.
Assuming byte-level parity and no-endian related behavior (i.e. a stream of bytes), the following works as intended (tested against a known 7-bit ASCII even parity table):
public static byte[] GetAsciiBytesEvenParity(this string text)
{
byte[] bytes = Encoding.ASCII.GetBytes(text);
for(int ii = 0; ii < bytes.Length; ii++)
{
// parity test adapted from:
// http://graphics.stanford.edu/~seander/bithacks.html#ParityParallel
if (((0x6996 >> ((bytes[ii] ^ (bytes[ii] >> 4)) & 0xf)) & 1) != 0)
{
bytes[ii] |= 0x80;
}
}
return bytes;
}
Is there a way to use ASCIIEncoding in Windows Phone 7?
Unless I'm doing something wrong Encoding.ASCII doesn't exist and I'm needing it for C# -> PHP encryption (as PHP only uses ASCII in SHA1 encryption).
Any suggestions?
It is easy to implement yourself, Unicode never messed with the ASCII codes:
public static byte[] StringToAscii(string s) {
byte[] retval = new byte[s.Length];
for (int ix = 0; ix < s.Length; ++ix) {
char ch = s[ix];
if (ch <= 0x7f) retval[ix] = (byte)ch;
else retval[ix] = (byte)'?';
}
return retval;
}
Not really seeing any detail in your question this could be off track. You are right Silverlight has no support for the ASCII encoding.
However I suspect that in fact UTF8 will do what you need. Its worth bearing in mind that a sequence of single byte ASCII only characters and the same set of characters encoded as UTF-8 are identical. That is the the complete ASCII character set is repeated verbatim by the first 128 single byte code points in UTF-8.
I have a Silverlight app that writes CSV files, which have to be encoded in ASCII (using UTF-8 causes accented characters to show up wrong when you open the files in Excel).
Since Silverlight doesn't have an Encoding.ASCII class, I implemented one as follows. It works for me, hope it's useful to you as well:
/// <summary>
/// Silverlight doesn't have an ASCII encoder, so here is one:
/// </summary>
public class AsciiEncoding : System.Text.Encoding
{
public override int GetMaxByteCount(int charCount)
{
return charCount;
}
public override int GetMaxCharCount(int byteCount)
{
return byteCount;
}
public override int GetByteCount(char[] chars, int index, int count)
{
return count;
}
public override byte[] GetBytes(char[] chars)
{
return base.GetBytes(chars);
}
public override int GetCharCount(byte[] bytes)
{
return bytes.Length;
}
public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
{
for (int i = 0; i < charCount; i++)
{
bytes[byteIndex + i] = (byte)chars[charIndex + i];
}
return charCount;
}
public override int GetCharCount(byte[] bytes, int index, int count)
{
return count;
}
public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
{
for (int i = 0; i < byteCount; i++)
{
chars[charIndex + i] = (char)bytes[byteIndex + i];
}
return byteCount;
}
}
I had similar problem using Xamarin (Mono) for Android where I'm using Portable Class Library and they don't support Econding.ASCII.
Instead, the only working solution (except doing it manually) is this one
Uri.EscapeDataString(yourString);
See this answer which provide additional information.
I started from #Hans Passant 's answer and I rewrote it with Linq :
/// <summary>
/// Gets an encoding for the ASCII (7-bit) character set.
/// </summary>
/// <see cref="http://stackoverflow.com/a/4022893/1248177"/>
/// <param name="s">A character set.</param>
/// <returns>An encoding for the ASCII (7-bit) character set.</returns>
public static byte[] StringToAscii(string s)
{
return (from char c in s select (byte)((c <= 0x7f) ? c : '?')).ToArray();
}
You may want to remove the call to ToArray() and return a IEnumerable<byte> instead of byte[].
According to this MS forum thread, Windows Phone 7 does not support Encoding.ASCII.
I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
string result = System.Text.Encoding.UTF8.GetString(byteArray);
There're at least four different ways doing this conversion.
Encoding's GetString, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String You can easily convert the output string back to byte array by using Convert.FromBase64String. Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncodeYou can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
I saw some answers at this post and it's possible to be considered completed base knowledge, because I have a several approaches in C# Programming to resolve the same problem. The only thing that is necessary to be considered is about a difference between pure UTF-8 and UTF-8 with a BOM.
Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.
Finally, my C# Programming for both UTF-8 encoding types (with BOM and pure) needed to be similar to this example below:
// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);
Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}
There is also class UnicodeEncoding, quite simple in usage:
ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);
Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));
In addition to the selected answer, if you're using .NET 3.5 or .NET 3.5 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:
string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
Alternatively:
var byteStr = Convert.ToBase64String(bytes);
The BitConverter class can be used to convert a byte[] to string.
var convertedString = BitConverter.ToString(byteAttay);
Documentation of BitConverter class can be fount on MSDN.
To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:
// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
int strlen = 0;
while
(
(startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
&& buffer[startIndex + strlen] != 0 // The typical null terimation check
)
{
++strlen;
}
return strlen;
}
// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
strlen = StringLength(buffer, startIndex);
byte[] c_str = new byte[strlen];
Array.Copy(buffer, startIndex, c_str, 0, strlen);
return Encoding.UTF8.GetString(c_str);
}
The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case
A LINQ one-liner for converting a byte array byteArrFilename read from a file to a pure ASCII C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.
String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
.Select(x => x < 128 ? (Char)x : '?').ToArray());
I use '?' as the default character for anything not pure ASCII here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.
Try this console application:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Hello, World!" };
Console.WriteLine("Main String: " + _mainString);
// Convert a string to UTF-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
// Convert UTF-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
Here is a result where you didn’t have to bother with encoding. I used it in my network class and send binary objects as string with it.
public static byte[] String2ByteArray(string str)
{
char[] chars = str.ToArray();
byte[] bytes = new byte[chars.Length * 2];
for (int i = 0; i < chars.Length; i++)
Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);
return bytes;
}
public static string ByteArray2String(byte[] bytes)
{
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; i++)
chars[i] = BitConverter.ToChar(bytes, i * 2);
return new string(chars);
}
string result = ASCIIEncoding.UTF8.GetString(byteArray);