I am getting data from Device(Time attendance) using C++ library in C# 4.0, issue is that with name field have some junk value.
Name field is byte array and I had try using Encoding.Default.GetString(user.Name), here user is a Struct.
[StructLayout(LayoutKind.Sequential, Size = 48, CharSet = CharSet.Ansi), Serializable]
public struct User
{
public int ID;
[MarshalAsAttribute(UnmanagedType.ByValArray, SizeConst = 12)]
public byte[] Name;
}
Output
"Jon\0 41 0"
"rakesh\0 6"
I want to remove \0 41 0 and \0 6.
Any help would be appreciated.
Keep it simple:
static class StringExtensions
{
public static string TrimNullTerminatedString(this string s)
{
if (s == null)
throw new NotImplementedException();
int i = s.IndexOf('\0');
if (i >= 0)
return s.Substring(0, i);
return s;
}
}
Use it like this:
string name = Encoding.Default.GetString(user.Name).TrimNullTerminatedString();
That being said, a better option would be to handle that at declaration level. If Name is a string, there is no reason to declare it as byte[]; declare it as a string, and the null terminating character will be handled properly:
[MarshalAsAttribute(UnmanagedType.ByValTStr, SizeConst = 12)]
public string Name;
It would also be easier to manipulate in code...
RegEx is a best way for removing junk value, In this example with W I remove all character that is not word,
textBox1.Text = Regex.Replace("rakesh\0 6", "W", "");
You can find complete library for regex on http://regexlib.com/
do it like this
Regex re = New Regex("[\x0A\x0D]", RegexOptions.Compiled)
str = re.Replace(str.Trim(), String.Empty)
OR
string str1="";
for(int i = 0 ; i < str.lengh ; i++) {
if(!char.IsLetter(str[i])
str1 += str[i];
}
return str1
You are dealing with null-terminated strings. So you want to strip zero byte and all bytes after the zero byte in your arrays before passing it to Encoding.Default.GetString(byte[]).
Update:
Example code (may be not very optimal):
static byte[] RemoveJunk(byte[] input)
{
var end = Array.IndexOf(input, (byte)0);
Console.WriteLine(end);
if (end < 0)
return input;
var result = new byte[end];
Array.Copy(input, result, end);
return result;
}
Related
I want to use my laptop to communicate with MES(Manufacturing Execution System).
And when I serialized the data (struct type), something happen.
The code below is what I have done:
[StructLayout(LayoutKind.Sequential, Pack = 4)]
struct DataPackage
{
public int a;
public ushort b;
public byte c;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 5)] public string d;
}
class Program
{
static void Main(string[] args)
{
DataPackage pack1 = new DataPackage();
pack1.a = 0x33333301;
pack1.b = 200;
pack1.c = 21;
pack1.d = "hello";
byte[] pack1_serialized = getBytes(pack1);
Console.WriteLine(BitConverter.ToString(pack1_serialized));
byte[] getBytes(DataPackage str)
{
int size = Marshal.SizeOf(str);
byte[] arr = new byte[size];
IntPtr ptr = Marshal.AllocHGlobal(size);
Marshal.StructureToPtr(str, ptr, true);
Marshal.Copy(ptr, arr, 0, size);
Marshal.FreeHGlobal(ptr);
return arr;
}
}
}
And here is the outcome:
I want the outcome to be like this:
33-33-33-01-00-C8-15-68-65-6C-6C-6F
So the questions are:
Why is the uint / ushort type data reverse after Marshalling?
Is there any other way that I can send the data in the sequence that I want ?
Why is the last word "o" in string "hello" disappear in the byte array ?
Thanks.
1 - Because your expected outcome is big endian, and your system appears to use little endian, so basically reversed order of bytes compared to what you expect.
2- Easiest way is to "convert" your numbers to big endian before marshalling (that is change them in a way which will produce desired result while converting them using little endian), for example like this:
static int ToBigEndianInt(int x) {
if (!BitConverter.IsLittleEndian)
return x; // already fine
var ar = BitConverter.GetBytes(x);
Array.Reverse(ar);
return BitConverter.ToInt32(ar, 0);
}
static ushort ToBigEndianShort(ushort x) {
if (!BitConverter.IsLittleEndian)
return x; // already fine
var ar = BitConverter.GetBytes(x);
Array.Reverse(ar);
return BitConverter.ToUInt16(ar, 0);
}
And then:
pack1.a = ToBigEndianInt(0x33333301);
pack1.b = ToBigEndianShort(200);
Note that this way of conversion is not very efficient and if you need more perfomance you can do this with some bit manipulations.
3 - Because string is null terminated, and this null terminator counts in SizeConst. Since you have it 5, there will be 4 characters of your string + 1 null terminator. Just increase SizeConst = 6 (that might add additional zeroes at the end because of Pack = 4).
I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
string result = System.Text.Encoding.UTF8.GetString(byteArray);
There're at least four different ways doing this conversion.
Encoding's GetString, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String You can easily convert the output string back to byte array by using Convert.FromBase64String. Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncodeYou can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
I saw some answers at this post and it's possible to be considered completed base knowledge, because I have a several approaches in C# Programming to resolve the same problem. The only thing that is necessary to be considered is about a difference between pure UTF-8 and UTF-8 with a BOM.
Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.
Finally, my C# Programming for both UTF-8 encoding types (with BOM and pure) needed to be similar to this example below:
// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);
Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}
There is also class UnicodeEncoding, quite simple in usage:
ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);
Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));
In addition to the selected answer, if you're using .NET 3.5 or .NET 3.5 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:
string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
Alternatively:
var byteStr = Convert.ToBase64String(bytes);
The BitConverter class can be used to convert a byte[] to string.
var convertedString = BitConverter.ToString(byteAttay);
Documentation of BitConverter class can be fount on MSDN.
To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:
// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
int strlen = 0;
while
(
(startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
&& buffer[startIndex + strlen] != 0 // The typical null terimation check
)
{
++strlen;
}
return strlen;
}
// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
strlen = StringLength(buffer, startIndex);
byte[] c_str = new byte[strlen];
Array.Copy(buffer, startIndex, c_str, 0, strlen);
return Encoding.UTF8.GetString(c_str);
}
The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case
A LINQ one-liner for converting a byte array byteArrFilename read from a file to a pure ASCII C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.
String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
.Select(x => x < 128 ? (Char)x : '?').ToArray());
I use '?' as the default character for anything not pure ASCII here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.
Try this console application:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Hello, World!" };
Console.WriteLine("Main String: " + _mainString);
// Convert a string to UTF-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
// Convert UTF-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
Here is a result where you didn’t have to bother with encoding. I used it in my network class and send binary objects as string with it.
public static byte[] String2ByteArray(string str)
{
char[] chars = str.ToArray();
byte[] bytes = new byte[chars.Length * 2];
for (int i = 0; i < chars.Length; i++)
Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);
return bytes;
}
public static string ByteArray2String(byte[] bytes)
{
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; i++)
chars[i] = BitConverter.ToChar(bytes, i * 2);
return new string(chars);
}
string result = ASCIIEncoding.UTF8.GetString(byteArray);
I know this has been answered but after reading the other questions I'm still with no solution. I have a file which was written with the following C++ struct:
typedef struct myStruct{
char Name[127];
char s1[2];
char MailBox[149];
char s2[2];
char RouteID[10];
} MY_STRUCT;
My approach was to be able to parse one field at a time in the struct, but my issue is that I cannot get s1 and MailBox to parse correctly. In the file, the s1 field contains "\r\n" (binary 0D0A), and this causes my parsing code to not parse the MailBox field correctly. Here's my parsing code:
[StructLayout(LayoutKind.Explicit, Size = 0x80 + 0x2 + 0x96)]
unsafe struct MY_STRUCT
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 0x80)]
public string Name;
[FieldOffset(0x80)]
public fixed char s1[2];
/* Does not work, "Could not load type 'MY_STRUCT' ... because it contains an object field at offset 130 that is incorrectly aligned or overlapped by a non-object field." */
[FieldOffset(0x80 + 0x2)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 0x96)]
public string MailBox;
}
If I comment out the last field and reduce the struct's size to 0x80+0x2 it will work correctly for the first two variables.
One thing to note is that the Name and Mailbox strings contain the null terminating character, but since s1 doesn't have the null-terminating character it seems to be messing up the parser, but I don't know why because to me it looks like the code is explicitly telling the Marshaler that the s1 field in the struct is only a fixed 2-char buffer, not a null-terminated string.
Here is a pic of my test data (in code I seek past the first row in the BinaryReader, so "Name" begins at 0x0, not 0x10).
Here's one way, it doesn't use unsafe (nor is it particularly elegant/efficient)
using System.Text;
using System.IO;
namespace ReadCppStruct
{
/*
typedef struct myStruct{
char Name[127];
char s1[2];
char MailBox[149];
char s2[2];
char RouteID[10];
} MY_STRUCT;
*/
class MyStruct
{
public string Name { get; set; }
public string MailBox { get; set; }
public string RouteID { get; set; }
}
class Program
{
static string GetString(Encoding encoding, byte[] bytes, int index, int count)
{
string retval = encoding.GetString(bytes, index, count);
int nullIndex = retval.IndexOf('\0');
if (nullIndex != -1)
retval = retval.Substring(0, nullIndex);
return retval;
}
static MyStruct ReadStruct(string path)
{
byte[] bytes = File.ReadAllBytes(path);
var utf8 = new UTF8Encoding();
var retval = new MyStruct();
int index = 0; int cb = 127;
retval.Name = GetString(utf8, bytes, index, cb);
index += cb + 2;
cb = 149;
retval.MailBox = GetString(utf8, bytes, index, cb);
index += cb + 2;
cb = 10;
retval.RouteID = GetString(utf8, bytes, index, cb);
return retval;
} // http://stackoverflow.com/questions/30742019/reading-binary-file-into-struct
static void Main(string[] args)
{
MyStruct ms = ReadStruct("MY_STRUCT.data");
}
}
}
Here's how I got it to work for me:
public static unsafe string BytesToString(byte* bytes, int len)
{
return new string((sbyte*)bytes, 0, len).Trim(new char[] { ' ' }); // trim trailing spaces (but keep newline characters)
}
[StructLayout(LayoutKind.Explicit, Size = 127 + 2 + 149 + 2 + 10)]
unsafe struct USRRECORD_ANSI
{
[FieldOffset(0)]
public fixed byte Name[127];
[FieldOffset(127)]
public fixed byte s1[2];
[FieldOffset(127 + 2)]
public fixed byte MailBox[149];
[FieldOffset(127 + 2 + 149)]
public fixed byte s2[2];
[FieldOffset(127 + 2 + 149 + 2)]
public fixed byte RouteID[10];
}
After the struct has been parsed, I can access the strings by calling the BytesToString method, e.g. string name = BytesToString(record.Name, 127);
I did notice that I don't need the Size attribute in the StructLayout, I'm not sure if it's the best practice to keep it or remove it, ideas?
Your struct sizes are not adding up correctly. The MailBox size is 0x95 as listed in MY_STRUCT, not 0x96 as you're calling it in the C# code.
If I do
char c = 'A';
byte[] b = BitConverter.GetBytes(c);
Length of b is 2.
However, if I have the following struct for interop purposes
[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct MyStruct
{
int i;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 8)]
char[] c;
public int TheInt
{
get { return i; }
set { i = value; }
}
public string TheString
{
get { return new string(c); }
set { c = value.ToCharArray(); }
}
}
then do
MyStruct m = new MyStruct();
m.TheInt = 10;
m.TheString = "Balloons";
int mSize = Marshal.SizeOf(m);
mSize is 12, not 20 as I expected.
MSDN says char storage is 2 bytes.
The first example supports this.
Am I doing something wrong with my struct?
Am I missing something?
Because you are are marshaling, and by default, a char will get marshalled to an ANSI char instead of a Unicode char. So "balloon" is 8 characters, which is 8 bytes when ANSI encoded, plus 4 bytes for your int, which is 12.
If you want the size to be 20 for marshalling, change your StructLayout and set the ChatSet to Unicode:
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode, Pack = 1)]
Now you will have your struct size as 20.
MSDN says char storage is 2 bytes.
That is true when we are talking about a CLR char, but not in the context of marshalling.
char is 2 bytes or 16-bit Unicode character (U +0000 to U +ffff)
char [] is a pointer type
int is 4 bytes
hence, about marshalling, I would pick vcsjones' answer.
I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
string result = System.Text.Encoding.UTF8.GetString(byteArray);
There're at least four different ways doing this conversion.
Encoding's GetString, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String You can easily convert the output string back to byte array by using Convert.FromBase64String. Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncodeYou can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
I saw some answers at this post and it's possible to be considered completed base knowledge, because I have a several approaches in C# Programming to resolve the same problem. The only thing that is necessary to be considered is about a difference between pure UTF-8 and UTF-8 with a BOM.
Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.
Finally, my C# Programming for both UTF-8 encoding types (with BOM and pure) needed to be similar to this example below:
// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);
Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}
There is also class UnicodeEncoding, quite simple in usage:
ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);
Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));
In addition to the selected answer, if you're using .NET 3.5 or .NET 3.5 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:
string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
Alternatively:
var byteStr = Convert.ToBase64String(bytes);
The BitConverter class can be used to convert a byte[] to string.
var convertedString = BitConverter.ToString(byteAttay);
Documentation of BitConverter class can be fount on MSDN.
To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:
// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
int strlen = 0;
while
(
(startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
&& buffer[startIndex + strlen] != 0 // The typical null terimation check
)
{
++strlen;
}
return strlen;
}
// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
strlen = StringLength(buffer, startIndex);
byte[] c_str = new byte[strlen];
Array.Copy(buffer, startIndex, c_str, 0, strlen);
return Encoding.UTF8.GetString(c_str);
}
The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case
A LINQ one-liner for converting a byte array byteArrFilename read from a file to a pure ASCII C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.
String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
.Select(x => x < 128 ? (Char)x : '?').ToArray());
I use '?' as the default character for anything not pure ASCII here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.
Try this console application:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Hello, World!" };
Console.WriteLine("Main String: " + _mainString);
// Convert a string to UTF-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
// Convert UTF-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
Here is a result where you didn’t have to bother with encoding. I used it in my network class and send binary objects as string with it.
public static byte[] String2ByteArray(string str)
{
char[] chars = str.ToArray();
byte[] bytes = new byte[chars.Length * 2];
for (int i = 0; i < chars.Length; i++)
Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);
return bytes;
}
public static string ByteArray2String(byte[] bytes)
{
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; i++)
chars[i] = BitConverter.ToChar(bytes, i * 2);
return new string(chars);
}
string result = ASCIIEncoding.UTF8.GetString(byteArray);