Convert C# string to C char array - c#

I am sending a string from C# to C via sockets:
write 5000 100
In C, I split the received string using spaces.
char **params = str_split(buffer, ' ');
And then access the 3rd parameter, and convert 100 into C char. However, I need to be able to send an array of chars from C# (1 byte each) so that I can use them in C.
For instance, let's say I need to send the following string:
write 5000 <byte[] { 0x01, 0x20, 0x45 }>
Of course, the byte array needs to be transformed into string characters in C# that can be sent via StreamWriter. StreamWriter accepts array of chars which are 2 bytes each, but I need 1 byte.
How can this be accomplished?

In C, char is of 1 byte size. Thus, to accommodate them from C#, you will need to send byte.
And it seems like you need two different inputs for your problem:
C#
string textFront = "write 5000"; //input 1
byte[] bytes = new byte[] { 0x01, 0x20, 0x45 }; //input 2
And then to send them together, I would rather use Stream which allows you to send byte[]. Thus, we only need to (1) Change the textFront into byte[], (2) concat textFront with bytes, and lastly (3) send combined variable as byte[].
byte[] frontBytes = Encoding.ASCII.GetBytes(textFront); // no (1)
byte[] combined = new byte[frontBytes.Length + bytes.Length];
frontBytes.CopyTo(combined, 0);
bytes.CopyTo(combined, frontBytes.Length); //no (2)
Stream stream = new StreamWriter(); //no (3)
stream.Write(combined, 0, combined.Length);

I don't quite understand your question, is it what you are looking for?
byte[] bytes = Encoding.UTF8.GetBytes("your string");
and vice versa
string text = Encoding.UTF8.GetString(bytes);

The StreamWriter constructor may receive a Encoding parameter. Maybe that's what you want.
var sw = new StreamWriter(your_stream, Encoding.ASCII);
sw.Write("something");
There is also the BinaryWriter class that can write strings and byte[].
var bw = new BinaryWriter(output_stream, Encoding.ASCII);
bw.Write("something");
bw.Write(new byte[] { 0x01, 0x20, 0x45 });

Related

Invalid JSON after calling Encoding.UTF8.GetString() [duplicate]

I'm having a problem comparing strings in a Unit Test in C# 4.0 using Visual Studio 2010. This same test case works properly in Visual Studio 2008 (with C# 3.5).
Here's the relevant code snippet:
byte[] rawData = GetData();
string data = Encoding.UTF8.GetString(rawData);
Assert.AreEqual("Constant", data, false, CultureInfo.InvariantCulture);
While debugging this test, the data string appears to the naked eye to contain exactly the same string as the literal. When I called data.ToCharArray(), I noticed that the first byte of the string data is the value 65279 which is the UTF-8 Byte Order Marker. What I don't understand is why Encoding.UTF8.GetString() keeps this byte around.
How do I get Encoding.UTF8.GetString() to not put the Byte Order Marker in the resulting string?
Update: The problem was that GetData(), which reads a file from disk, reads the data from the file using FileStream.readbytes(). I corrected this by using a StreamReader and converting the string to bytes using Encoding.UTF8.GetBytes(), which is what it should've been doing in the first place! Thanks for all the help.
Well, I assume it's because the raw binary data includes the BOM. You could always remove the BOM yourself after decoding, if you don't want it - but you should consider whether the byte array should consider the BOM to start with.
EDIT: Alternatively, you could use a StreamReader to perform the decoding. Here's an example, showing the same byte array being converted into two characters using Encoding.GetString or one character via a StreamReader:
using System;
using System.IO;
using System.Text;
class Test
{
static void Main()
{
byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
string viaEncoding = Encoding.UTF8.GetString(withBom);
Console.WriteLine(viaEncoding.Length);
string viaStreamReader;
using (StreamReader reader = new StreamReader
(new MemoryStream(withBom), Encoding.UTF8))
{
viaStreamReader = reader.ReadToEnd();
}
Console.WriteLine(viaStreamReader.Length);
}
}
There is a slightly more efficient way to do it than creating StreamReader and MemoryStream:
1) If you know that there is always a BOM
string viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);
2) If you don't know, check:
string viaEncoding;
if (withBom.Length >= 3 && withBom[0] == 0xEF && withBom[1] == 0xBB && withBom[2] == 0xBF)
viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);
else
viaEncoding = Encoding.UTF8.GetString(withBom);
Unfortunately the BOM won't be removed with a simple Trim(). But it can be done as follows:
byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
byte[] bom = { 0xef, 0xbb, 0xbf };
var text = System.Text.Encoding.UTF8.GetString(withBom);
Console.WriteLine($"Untrimmed: {text.Length}, {text}");
var trimmed = text.Trim(System.Text.Encoding.UTF8.GetString(bom).ToCharArray());
Console.WriteLine($"Trimmed: {trimmed.Length}, {trimmed}");
Output:
Untrimmed: 2, A
Trimmed: 1, A
I believe the extra character is removed if you Trim() the decoded string

C# ByteArray to string conversion and back

I have a uint value that I need to represent as a ByteArray and the convert in a string.
When I convert back the string to a byte array I found different values.
I'm using standard ASCII converter so I don't understand why I'm getting different values.
To be more clear this is what I'm doing:
byte[] bArray = BitConverter.GetBytes((uint)49694);
string test = System.Text.Encoding.ASCII.GetString(bArray);
byte[] result = Encoding.ASCII.GetBytes(test);
The bytearray result is different from the first one:
bArray ->
[0x00000000]: 0x1e
[0x00000001]: 0xc2
[0x00000002]: 0x00
[0x00000003]: 0x00
result ->
[0x00000000]: 0x1e
[0x00000001]: 0x3f
[0x00000002]: 0x00
[0x00000003]: 0x00
Notice that the byte 1 is different in the two arrays.
Thanks for your support.
Regards
string test = System.Text.Encoding.ASCII.GetString(bArray);
byte[] result = Encoding.ASCII.GetBytes(test);
Because raw data is not ASCII. Encoding.GetString is only meaningful if the data you are decoding is text data in that encoding. Anything else: you corrupt it. If you want to store a byte[] as a string, then base-n is necessary - typically base-64 because a: it is conveniently available (Convert.{To|From}Base64String), and b: you can fit it into ASCII, so you rarely hit code-page / encoding issues. For example:
byte[] bArray = BitConverter.GetBytes((uint)49694);
string test = Convert.ToBase64String(bArray); // "HsIAAA=="
byte[] result = Convert.FromBase64String(test);
Because c2 is not a valid ASCII char and it is replaced with '?'(3f)
Converting any byte array to string using SomeEncoding.GetString() is not a safe method as #activwerx suggested in comments. Instead use Convert.FromBase64String, Convert.ToBase64String

Convert a binary number to ascii characters

I am reading information from a device and it's returning data to me in integer format and I need to convert this to ASCII characters using C#.
The following is an example of the data I have to convert. I receive the integer value 26990 back from my device and I need to convert that to ASCII. I just happen to know that for this value, the desired result would be "ni".
I know that the integer value 26990 is equal to 696e in hex, and 110100101101110 in binary, but would like to know how to do the conversion as I can't work it out for myself.
Can anyone help please?
Many thanks,
Karl
int i = 26990;
char c1 = (char)(i & 0xff);
char c2 = (char)(i >> 8);
Console.WriteLine("{0}{1}", c1, c2);
Use BitConverter and Encoding to perform the conversion:
class Program
{
public static void Main()
{
int d = 26990;
byte[] bytes = BitConverter.GetBytes(d);
string s = System.Text.Encoding.ASCII.GetString(bytes);
// note that s will contain extra NULLs here so..
s = s.TrimEnd('\0');
}
}
If the device is sending bytes you can use something like this:
byte[] bytes = new byte[] { 0x69, 0x6e };
string text = System.Text.Encoding.ASCII.GetString(bytes);
Otherwise, if the device is sending integers you can use something like this:
byte[] bytes = BitConverter.GetBytes(26990);
string text = System.Text.Encoding.ASCII.GetString(bytes, 0, 2);
In either case, text will contain "ni" after that executes. Note that in the latter you may have to deal with endianness issues.
Further reference:
Encoding class
Encoding.ASCII property
Encoding.GetString method overloads

How do I load a string into a FileStream without going to disk?

string abc = "This is a string";
How do I load abc into a FileStream?
FileStream input = new FileStream(.....);
Use a MemoryStream instead...
MemoryStream ms = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(abc));
remember a MemoryStream (just like a FileStream) needs to be closed when you have finished with it. You can always place your code in a using block to make this easier...
using(MemoryStream ms = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(abc)))
{
//use the stream here and don't worry about needing to close it
}
NOTE: If your string is Unicode rather than ASCII you may want to specify this when converting to a Byte array. Basically, a Unicode character takes up 2 bytes instead of 1. Padding will be added if needed (e.g. 0x00 0x61 = "a" in unicode, where as in ASCII 0x61 = "a")

How do I ignore the UTF-8 Byte Order Marker in String comparisons?

I'm having a problem comparing strings in a Unit Test in C# 4.0 using Visual Studio 2010. This same test case works properly in Visual Studio 2008 (with C# 3.5).
Here's the relevant code snippet:
byte[] rawData = GetData();
string data = Encoding.UTF8.GetString(rawData);
Assert.AreEqual("Constant", data, false, CultureInfo.InvariantCulture);
While debugging this test, the data string appears to the naked eye to contain exactly the same string as the literal. When I called data.ToCharArray(), I noticed that the first byte of the string data is the value 65279 which is the UTF-8 Byte Order Marker. What I don't understand is why Encoding.UTF8.GetString() keeps this byte around.
How do I get Encoding.UTF8.GetString() to not put the Byte Order Marker in the resulting string?
Update: The problem was that GetData(), which reads a file from disk, reads the data from the file using FileStream.readbytes(). I corrected this by using a StreamReader and converting the string to bytes using Encoding.UTF8.GetBytes(), which is what it should've been doing in the first place! Thanks for all the help.
Well, I assume it's because the raw binary data includes the BOM. You could always remove the BOM yourself after decoding, if you don't want it - but you should consider whether the byte array should consider the BOM to start with.
EDIT: Alternatively, you could use a StreamReader to perform the decoding. Here's an example, showing the same byte array being converted into two characters using Encoding.GetString or one character via a StreamReader:
using System;
using System.IO;
using System.Text;
class Test
{
static void Main()
{
byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
string viaEncoding = Encoding.UTF8.GetString(withBom);
Console.WriteLine(viaEncoding.Length);
string viaStreamReader;
using (StreamReader reader = new StreamReader
(new MemoryStream(withBom), Encoding.UTF8))
{
viaStreamReader = reader.ReadToEnd();
}
Console.WriteLine(viaStreamReader.Length);
}
}
There is a slightly more efficient way to do it than creating StreamReader and MemoryStream:
1) If you know that there is always a BOM
string viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);
2) If you don't know, check:
string viaEncoding;
if (withBom.Length >= 3 && withBom[0] == 0xEF && withBom[1] == 0xBB && withBom[2] == 0xBF)
viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);
else
viaEncoding = Encoding.UTF8.GetString(withBom);
Unfortunately the BOM won't be removed with a simple Trim(). But it can be done as follows:
byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
byte[] bom = { 0xef, 0xbb, 0xbf };
var text = System.Text.Encoding.UTF8.GetString(withBom);
Console.WriteLine($"Untrimmed: {text.Length}, {text}");
var trimmed = text.Trim(System.Text.Encoding.UTF8.GetString(bom).ToCharArray());
Console.WriteLine($"Trimmed: {trimmed.Length}, {trimmed}");
Output:
Untrimmed: 2, A
Trimmed: 1, A
I believe the extra character is removed if you Trim() the decoded string

Categories

Resources