C# Convert HEX (Korean) To Korean Letters - c#

I have a Hex string that I am converting into characters. This is the function I am using.
public string GetAsciiString(bool replaceNewline = true)
{
char[] chars = new char[data.Length + 1];
byte[] bytes = new byte[data.Length + 1];
bytes[0] = opcode;
Array.Copy(data, 0, bytes, 1, data.Length);
for (int i = 0; i < bytes.Length; ++i)
{
byte value = bytes[i];
if ((value == '\n' || value == '\r') && !replaceNewline)
chars[i] = (char)value;
else if (value < 32 || value > 126)
chars[i] = '.';
else chars[i] = (char)value;
}
return new string(chars);
}
However this only displays english characters and not korean. Any idea on how I can get it to display Korean?
Edit: I see the issue was that I was converting to Ascii.

Okay, there are multiple ways to go at this.
The operation to turn byte arrays into strings is known as decoding. In .NET, this is done with the Encoding class. In your case, you must first find the corresponding encoding. Looking at the documentation linked above, encoding ks_c_5601-1987 corresponds to code page 949, so:
var encoding = Encoding.GetEncoding(949);
Once you have that encoding, you can use it to decode the bytes:
var text = encoding.GetString(bytes);
This leaves the matter of those bytes. If you already have a byte array, you're good to go. If you have a hex string, I suggest you look into existing questions for this, for instance: https://stackoverflow.com/a/311165/.
If your data is in a stream, I recommend you use a StreamReader for this instead:
using(var reader = new StreamReader(stream, encoding)) // use encoding to decode from stream
{
var text = reader.ReadToEnd();
}

Related

How to read blob data from MySQL in c# [duplicate]

I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
string result = System.Text.Encoding.UTF8.GetString(byteArray);
There're at least four different ways doing this conversion.
Encoding's GetString, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String You can easily convert the output string back to byte array by using Convert.FromBase64String. Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncodeYou can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
I saw some answers at this post and it's possible to be considered completed base knowledge, because I have a several approaches in C# Programming to resolve the same problem. The only thing that is necessary to be considered is about a difference between pure UTF-8 and UTF-8 with a BOM.
Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.
Finally, my C# Programming for both UTF-8 encoding types (with BOM and pure) needed to be similar to this example below:
// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);
Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}
There is also class UnicodeEncoding, quite simple in usage:
ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);
Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));
In addition to the selected answer, if you're using .NET 3.5 or .NET 3.5 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:
string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
Alternatively:
var byteStr = Convert.ToBase64String(bytes);
The BitConverter class can be used to convert a byte[] to string.
var convertedString = BitConverter.ToString(byteAttay);
Documentation of BitConverter class can be fount on MSDN.
To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:
// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
int strlen = 0;
while
(
(startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
&& buffer[startIndex + strlen] != 0 // The typical null terimation check
)
{
++strlen;
}
return strlen;
}
// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
strlen = StringLength(buffer, startIndex);
byte[] c_str = new byte[strlen];
Array.Copy(buffer, startIndex, c_str, 0, strlen);
return Encoding.UTF8.GetString(c_str);
}
The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case
A LINQ one-liner for converting a byte array byteArrFilename read from a file to a pure ASCII C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.
String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
.Select(x => x < 128 ? (Char)x : '?').ToArray());
I use '?' as the default character for anything not pure ASCII here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.
Try this console application:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Hello, World!" };
Console.WriteLine("Main String: " + _mainString);
// Convert a string to UTF-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
// Convert UTF-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
Here is a result where you didn’t have to bother with encoding. I used it in my network class and send binary objects as string with it.
public static byte[] String2ByteArray(string str)
{
char[] chars = str.ToArray();
byte[] bytes = new byte[chars.Length * 2];
for (int i = 0; i < chars.Length; i++)
Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);
return bytes;
}
public static string ByteArray2String(byte[] bytes)
{
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; i++)
chars[i] = BitConverter.ToChar(bytes, i * 2);
return new string(chars);
}
string result = ASCIIEncoding.UTF8.GetString(byteArray);

Reading a single UTF8 character from stream in C#

I am looking to read the next UTF8 character from a Stream or BinaryReader. Things that don't work:
BinaryReader::ReadChar -- this will throw on a 3 or 4 byte character. Since it returns a two byte structure, it has no choice.
BinaryReader::ReadChars -- this will throw if you ask it to read 1 character and it encounters a 3 or 4 byte character. Will read multiple characters if you ask it to read more than 1 character.
StreamReader::Read -- this needs to know how many bytes to read, but the number of bytes in a UTF8 character is variable.
The code I have that seems to work:
private char[] ReadUTF8Char(Stream s)
{
byte[] bytes = new byte[4];
var enc = new UTF8Encoding(false, true);
if (1 != s.Read(bytes, 0, 1))
return null;
if (bytes[0] <= 0x7F) //Single byte character
{
return enc.GetChars(bytes, 0, 1);
}
else
{
var remainingBytes =
((bytes[0] & 240) == 240) ? 3 : (
((bytes[0] & 224) == 224) ? 2 : (
((bytes[0] & 192) == 192) ? 1 : -1
));
if (remainingBytes == -1)
return null;
s.Read(bytes, 1, remainingBytes);
return enc.GetChars(bytes, 0, remainingBytes + 1);
}
}
Obviously, this is a bit of a mess, and somewhat specific to UTF8. Is there a more elegant, less custom, easier-to-read solution to this problem?
I know this question is a bit old but here is another solution. It is not as good in performance as the OPs solution (which I also prefer), but it only uses builtin-utf8-functionality without knowing about the utf8-encoding internals.
private static char ReadUTF8Char(Stream s)
{
if (s.Position >= s.Length)
throw new Exception("Error: Read beyond EOF");
using (BinaryReader reader = new BinaryReader(s, Encoding.Unicode, true))
{
int numRead = Math.Min(4, (int)(s.Length - s.Position));
byte[] bytes = reader.ReadBytes(numRead);
char[] chars = Encoding.UTF8.GetChars(bytes);
if (chars.Length == 0)
throw new Exception("Error: Invalid UTF8 char");
int charLen = Encoding.UTF8.GetByteCount(new char[] { chars[0] });
s.Position += (charLen - numRead);
return chars[0];
}
}
The encoding passed to the constructor of BinaryReader doesn't matter. I had to use this version of the constructor to leave the stream open. If you already have a binary reader you can just use this:
private static char ReadUTF8Char(BinaryReader reader)
{
var s = reader.BaseStream;
if (s.Position >= s.Length)
throw new Exception("Error: Read beyond EOF");
int numRead = Math.Min(4, (int)(s.Length - s.Position));
byte[] bytes = reader.ReadBytes(numRead);
char[] chars = Encoding.UTF8.GetChars(bytes);
if (chars.Length == 0)
throw new Exception("Error: Invalid UTF8 char");
int charLen = Encoding.UTF8.GetByteCount(new char[] { chars[0] });
s.Position += (charLen - numRead);
return chars[0];
}

converting string to byte array in c# getting zeros in bytes

i am using below code to read data in a .key extension file to byte array, i am getting 16 bytes output but all bytes are zeros.
System.IO.StreamReader myFile = new System.IO.StreamReader(Server.MapPath("ELECTRI.KEY"));
string myString = myFile.ReadLine();
string str = myString;
//string str = myString.Substring(0, myString.Length - 2);
BitArray bit = new BitArray(str.Length);
for (int i = 0; i < str.Length; i++)
{
if (str.Substring(i, 1) == "1")
{
bit[i] = true;
}
else
{
bit[i] = false;
}
}
int numBytes = bit.Count;
if (bit.Count % 8 != 0) numBytes++;
keyyy = new byte[numBytes];
keyyy = new byte[numBytes];
This sentence means initializing keyyy with new byte[numBytes], so all bytes are zeros. You have to assign each
Here is the code you need to fill the array
int numBytes = bit.Count;
if (bit.Count % 8 != 0) numBytes++;
keyyy = new byte[numBytes];
bit.CopyTo(keyyy, 0);
EDIT
byte[] keyyy = File.ReadAllBytes("ELECTRI.KEY");
I think if I understand your problem correctly now, it is a simple as doing this!
Here is another way you could approach the problem:
string str = "1111111100000000";
byte[] keyyy = new byte[str.Length / 8]
.Select((b, index) => Convert.ToByte(str.Substring(index * 8, 8), 2)).ToArray();
This statement splits the string into groups of 8 characters (ignoring remainders) using Substring. Each string of 8 characters (i.e. 11111111) is then converted into a byte using Convert.ToByte. The paramter 2 in Convert.ToByte let's the method know it is a base2 binary string.
Try this:
byte[] buffer;
using (StreamReader reader= new StreamReader("PATH HERE")) {
var input = reader.ReadToEnd();
buffer = Encoding.UTF8.GetBytes(input);
}
PS: This will read the whole file contents.

How do you read UTF-8 characters from an infinite byte stream - C#

Normally, to read characters from a byte stream you use a StreamReader. In this example I'm reading records delimited by '\r' from an infinite stream.
using(var reader = new StreamReader(stream, Encoding.UTF8))
{
var messageBuilder = new StringBuilder();
var nextChar = 'x';
while (reader.Peek() >= 0)
{
nextChar = (char)reader.Read()
messageBuilder.Append(nextChar);
if (nextChar == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}
}
The problem is that the StreamReader has a small internal buffer, so if the code waiting for an 'end of record' delimiter ('\r' in this case) it has to wait until the StreamReader's internal buffer is flushed (usually because more bytes have arrived).
This alternative implementation works for single byte UTF-8 characters, but will fail on multibyte characters.
int byteAsInt = 0;
var messageBuilder = new StringBuilder();
while ((byteAsInt = stream.ReadByte()) != -1)
{
var nextChar = Encoding.UTF8.GetChars(new[]{(byte) byteAsInt});
Console.Write(nextChar[0]);
messageBuilder.Append(nextChar);
if (nextChar[0] == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}
How can I modify this code so that it works with multi-byte characters?
Rather than Encoding.UTF8.GetChars which is designed to convert complete buffers, get an instance of Decoder and repeatedly call its member method GetChars this will make use of the Decoder's internal buffer to handle partial multi-byte sequences from the end of one call to the next.
Thanks to Richard, I now have a working infinite stream reader. As he explained, the trick is to use a Decoder instance and call its GetChars method. I've tested it with multi-byte Japanese text and it works fine.
int byteAsInt = 0;
var messageBuilder = new StringBuilder();
var decoder = Encoding.UTF8.GetDecoder();
var nextChar = new char[1];
while ((byteAsInt = stream.ReadByte()) != -1)
{
var charCount = decoder.GetChars(new[] {(byte) byteAsInt}, 0, 1, nextChar, 0);
if(charCount == 0) continue;
Console.Write(nextChar[0]);
messageBuilder.Append(nextChar);
if (nextChar[0] == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}
I don't understand why you're not using the stream reader's ReadLine method. If there's a good reason not to, however, it nonetheless seems to me that repeatedly calling GetChars on the decoder is inefficient. Why not make use of the fact that the byte representation of '\r' can't be part of a multi-byte sequence? (Bytes in a multi-byte sequence must be greater than 127; that is, they have the highest bit set.)
var messageBuilder = new List<byte>();
int byteAsInt;
while ((byteAsInt = stream.ReadByte()) != -1)
{
messageBuilder.Add((byte)byteAsInt);
if (byteAsInt == '\r')
{
var messageString = Encoding.UTF8.GetString(messageBuilder.ToArray());
Console.Write(messageString);
ProcessBuffer(messageString);
messageBuilder.Clear();
}
}
Mike,
I found your solution perfect for my situation as well. But I noticed that sometimes it takes four GetChar() calls to determine the characters to be returned. This meant that charCount was 2, while my nextChar buffer size was 1. So I got error "The output character buffer is too small to contain the decoded characters, encoding Unicode fallback System.Text.DecoderReplacementFallback."
I changed my code to:
// ...
var nextChar = new char[4]; // 2 might suffice
for (var i = startPos; i < bytesRead; i++)
{
int charCount;
//...
charCount = decoder.GetChars(buffer, i, 1, nextChar, 0);
if (charCount == 0)
{
bytesSkipped++;
continue;
}
for (int ic = 0; ic < charCount; ic++)
{
char c = nextChar[ic];
charPos++;
// Process character here...
}
}

How to convert UTF-8 byte[] to string

I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
string result = System.Text.Encoding.UTF8.GetString(byteArray);
There're at least four different ways doing this conversion.
Encoding's GetString, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String You can easily convert the output string back to byte array by using Convert.FromBase64String. Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncodeYou can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
I saw some answers at this post and it's possible to be considered completed base knowledge, because I have a several approaches in C# Programming to resolve the same problem. The only thing that is necessary to be considered is about a difference between pure UTF-8 and UTF-8 with a BOM.
Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.
Finally, my C# Programming for both UTF-8 encoding types (with BOM and pure) needed to be similar to this example below:
// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);
Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}
There is also class UnicodeEncoding, quite simple in usage:
ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);
Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));
In addition to the selected answer, if you're using .NET 3.5 or .NET 3.5 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:
string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
Alternatively:
var byteStr = Convert.ToBase64String(bytes);
The BitConverter class can be used to convert a byte[] to string.
var convertedString = BitConverter.ToString(byteAttay);
Documentation of BitConverter class can be fount on MSDN.
To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:
// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
int strlen = 0;
while
(
(startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
&& buffer[startIndex + strlen] != 0 // The typical null terimation check
)
{
++strlen;
}
return strlen;
}
// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
strlen = StringLength(buffer, startIndex);
byte[] c_str = new byte[strlen];
Array.Copy(buffer, startIndex, c_str, 0, strlen);
return Encoding.UTF8.GetString(c_str);
}
The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case
A LINQ one-liner for converting a byte array byteArrFilename read from a file to a pure ASCII C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.
String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
.Select(x => x < 128 ? (Char)x : '?').ToArray());
I use '?' as the default character for anything not pure ASCII here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.
Try this console application:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Hello, World!" };
Console.WriteLine("Main String: " + _mainString);
// Convert a string to UTF-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
// Convert UTF-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
Here is a result where you didn’t have to bother with encoding. I used it in my network class and send binary objects as string with it.
public static byte[] String2ByteArray(string str)
{
char[] chars = str.ToArray();
byte[] bytes = new byte[chars.Length * 2];
for (int i = 0; i < chars.Length; i++)
Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);
return bytes;
}
public static string ByteArray2String(byte[] bytes)
{
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; i++)
chars[i] = BitConverter.ToChar(bytes, i * 2);
return new string(chars);
}
string result = ASCIIEncoding.UTF8.GetString(byteArray);

Categories

Resources