Hi I'm trying to transform a string containing special characters like û and ….
In my research and tests I almost succeeded using the following function:
public static string ToHex(this string input)
{
char[] values = input.ToCharArray();
string hex = "0x";
string add = "";
foreach (char c in values)
{
int value = Convert.ToInt32(c);
add = String.Format("{0:X}", value).Length == 1 ?
"0" + String.Format("{0:X}", value) + "00"
: String.Format("{0:X}", value) + "00";
hex += add;
}
return hex;
}
If I try to decode ´o¸sçPQ^ûË\u000f±d it does it correctly and turns it into this 0xB4006F00B8007300E700500051005E00FB00CB000F00B1006400,
instead when I try to decode ´o¸sçPQ](ÂF\u0012…a it fails and turns it into 0xB4006F00B8007300E700500051005D002800C200460012002026006100 instead of this
0xB4006F00B8007300E700500051005D002800C2004600120026206100.
Making a minimum of debug I saw that the string is transformed from
´o¸sçPQ](ÂF\u0012…a to ´o¸sçPQ](ÂF.a, I wouldn't want that to be the problem but I'm not sure.
EDIT
0xB4006F00B8007300E700500051005D002800C2004600120026206100 ´o¸sçPQ](ÂF…a CORRECT
0xB4006F00B8007300E700500051005D002800C200460012002026006100 ´o¸sçPQ](ÂF.a MY OUTPUT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» CORRECT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» MY OUTPUT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na CORRECT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na MY OUTPUT
0xB4006F00B8007300E700500051005F001A20BC006B0021003500DD00 ´o¸sçPQ_‚¼k!5Ý CORRECT
0xB4006F00B8007300E700500051005F00201A00BC006B0021003500DD00 ´o¸sçPQ_'¼k!5Ý MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006B00290014204E004100 ´o¸sçPQ]/îk)—NA CORRECT
0xB4006F00B8007300E700500051005D002F00EE006B0029002014004E004100 ´o¸sçPQ]/îk)-NA MY OUTPUT
0xB4006F00B8007300E700500051005D003800E600690036001C204C004F00 ´o¸sçPQ]8æi6“LO CORRECT
0xB4006F00B8007300E700500051005D003800E60069003600201C004C004F00 ´o¸sçPQ]8æi6"LO MY OUTPUT
0xB4006F00B8007300E700500051005D002F00F3006200390014204E004700C602 ´o¸sçPQ]/ób9—NGˆ CORRECT
0xB4006F00B8007300E700500051005D002F00F300620039002014004E0047002C600 ´o¸sçPQ]/ób9-NG^ MY OUTPUT
0xB4006F00B8007300E700500051005D003B00EE007200330078014100 ´o¸sçPQ];îr3ŸA CORRECT
0xB4006F00B8007300E700500051005D003B00EE0072003300178004100 ´o¸sçPQ];îr3YA MY OUTPUT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>K CORRECT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>?K MY OUTPUT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> CORRECT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006A003000DC024500 ´o¸sçPQ]/îj0˜E CORRECT
0xB4006F00B8007300E700500051005D002F00EE006A0030002DC004500 ´o¸sçPQ]/îj0~E MY OUTPUT
I thank you in advance for every reply or comment,
greetings.
This is due to endianness, and different integer and string encodings.
char cc = '…';
Console.WriteLine(cc);
// 2026 <-- note, hex value differs from byte representation shown below
Console.WriteLine(((int)cc).ToString("x"));
// 26200000
Console.WriteLine(BytesToHex(BitConverter.GetBytes((int)cc)));
// 2620
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes(new[] { cc })));
You should not treat chars as integers. There are plenty of different ways to encode strings, .net internally uses UTF-16. And all encodings works with bytes, not with integers. Explicit conversion chars to integer can lead to unexpected results, like yours. Why don't you get encoding you need and work with bytes via Encoding.GetBytes?
void Main()
{
// output you expect 0xB4006F00B8007300E700500051005D002800C2004600120026206100
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes("´o¸sçPQ](ÂF\u0012…a")));
}
public static string BytesToHex(byte[] bytes)
{
// whatever way to convert bytes to hex
return "0x" + BitConverter.ToString(bytes).Replace("-", "");
}
Related
Trying to convert a huge hex string to a binary string, but the OverflowException keeps gets thrown. This is my code to convert an image file to a hex string (which when used with a FlowDocument works perfectly!):
string h = new System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary(System.IO.File.ReadAllBytes(Path)).ToString();
Now, however, I want to take this hex string and convert it to a binary string so that it may also displayed in FlowDocument. First, I tried writing it to a temp text file and then attempt to read it into a byte array:
string TempPath = System.IO.Path.Combine(System.IO.Path.GetTempPath(), "Text.txt");
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(TempPath))
{
sw.WriteLine(Convert.ToString(Convert.ToInt64(h, 16), 2).PadLeft(12, '0'));
}
byte[] c = System.IO.File.ReadAllBytes(TempPath);
When that didn't work, I tried reading it into a string:
string c = System.IO.File.ReadAll(TempPath);
Neither worked and still throw OverflowException. I have also tried just doing this and skipped writing to a file altogether:
string s = Convert.ToString(Convert.ToInt64(h, 16), 2).PadLeft(12, '0')
And despite what approach I take, I still get an exception thrown. How are large strings like this normally handled?
Update
I've modified my algorithm to convert one character at a time, so now it looks like this:
string NewBinary = "";
try
{
int i = 0;
foreach (char c in h)
{
if (i == 100) break;
NewBinary = string.Concat(NewBinary, Convert.ToString(Convert.ToInt64(c.ToString(), 16), 2).PadLeft(12, '0'));
i++;
}
}
The problem with this is that the string is always going to be super long and the code above takes a LONG time to generate the binary string. I limited the length to 100 to test conversion, so the conversion itself is not an issue.
An int64 is represented by a 16 character hex string, which is why attempting to convert a "huge string" causes an OverflowException - the value is more than can be represented by an int64. You will need to break the string up into groups of max 16 chars & convert those to binary & concatenate them.
You could convert a nibble at a time using a lookup array, for example:
public static string HexStringToBinaryString(string hexString)
{
var result = new StringBuilder();
string[] lookup =
{
"0000", "0001", "0010", "0011",
"0100", "0101", "0110", "0111",
"1000", "1001", "1010", "1011",
"1100", "1101", "1110", "1111"
};
foreach (char nibble in hexString.Select(char.ToUpper))
result.Append((nibble > '9') ? lookup[10+nibble-'A'] : lookup[nibble-'0']);
return result.ToString();
}
Convert each hex character of the string into its corresponding binary pattern (eg A becomes 1010 etc)
I am trying to convert a StringBuilder into an Int64 but I get Overflow Exception and I cannot figure out why. The length of the string is not greater than the int. My current code is:
+ $exception {"Value was either too large or too small for an Int64."} System.Exception {System.OverflowException}
private static Int64 EightBit(string Data)
{
StringBuilder Pile = new StringBuilder();
char[] values = Data.ToCharArray();
foreach (char letter in values)
{
// Get the integral value of the character.
int value = Convert.ToInt32(letter);
// Convert the decimal value to a hexadecimal value in string form.
string hexOutput = String.Format("{0:X}", value);
Pile.Append(Convert.ToString(Convert.ToInt32(hexOutput, 16), 2));
}
return Convert.ToInt64(Pile.ToString()); //Error here
}
You are converting the ASCII values of a string to their binary values (Convert.ToString(string, 2) converts to base 2, right?)
character -> ASCII value -> Hex string representation -> Int Value -> Binary string representation of int value -> try to parse that string in base 10.
For instance, if "Data" is "147483647", then "Pile" is "110001110100110111110100111000110011110110110100110111" which is clearly larger than 'long.MaxValue' in base 10. By converting each character in a string to the binary representation of their ASCII values, you are creating a number string with far, far more digits than originally when you finally to to parse it base 10.
Just a quick step-through in the debugger should confirm the problem.
I have a very specific requirement. I have some data. Of which, strings and spaces are to be converted to EBCDIC while numbers to Hexadecimal.
For Example, my string is "Test123"
Test => EBCDIC
123 => Hexadecimal.
What I am trying to do is check every character in string if its number or not, and then based on that doing my conversion.
byte[] dataBuffer = new byte[length];
int i = 0;
if (toEBCDIC)
{
foreach (char c in data)
{
byte[] temp = new byte[1];
if (Char.IsNumber(c))
{
string hexValue = Convert.ToInt32(c).ToString("X");
temp = Encoding.ASCII.GetBytes(hexValue);
dataBuffer[i] = temp[0];
}
else
{
temp = Encoding.GetEncoding("IBM01140").GetBytes(c.ToString());
dataBuffer[i] = temp[0];
}
i++;
}
dataBuffer.CopyTo(array, byteIndex);
The problem comes when i try to convert the number. I need to keep my output in byte array, as i have to write the output to a memory stream and then to a file.
When i get the hex value of number, and then try to convert it to byte, actual conversion happens.
For "1", hexvalue = 31.
Now I want to keep this 31 unchanged in bytes. I mean to say that, when i write it to byte array, it should remain 31 only. But when do GetBytes, it makes byte array, converting 3 and 1 separately to bytes.
Can anyone please help me on this..!!
The problem is here:
ToString("X")
Now it's a hexadecimal string. So in your example, from this point onward, the 3 and the 1 have become separated.
How to fix this: don't convert.
if (Char.IsNumber(c))
{
dataBuffer[i] = (byte)c;
}
Not tested. I think that's what you want. At least, that's what you describe in the last paragraph. That wouldn't make the numbers hexadecimal though - it would make them ASCII, and it's a bit odd to be mixing that with EBCDIC.
You convert the char to its code and then convert that code to string. You don't have to do the second step, instead use the code directly:
if (Char.IsNumber(c))
{
byte hexValue = Convert.ToByte(c);
dataBuffer[i] = hexValue;
}
I need to convert a string into it's binary equivilent and keep it in a string. Then return it back into it's ASCII equivalent.
You can encode a string into a byte-wise representation by using an Encoding, e.g. UTF-8:
var str = "Out of cheese error";
var bytes = Encoding.UTF8.GetBytes(str);
To get back a .NET string object:
var strAgain = Encoding.UTF8.GetString(bytes);
// str == strAgain
You seem to want the representation as a series of '1' and '0' characters; I'm not sure why you do, but that's possible too:
var binStr = string.Join("", bytes.Select(b => Convert.ToString(b, 2)));
Encodings take an abstract string (in the sense that they're an opaque representation of a series of Unicode code points), and map them into a concrete series of bytes. The bytes are meaningless (again, because they're opaque) without the encoding. But, with the encoding, they can be turned back into a string.
You seem to be mixing up "ASCII" with strings; ASCII is simply an encoding that deals only with code-points up to 128. If you have a string containing an 'é', for example, it has no ASCII representation, and so most definitely cannot be represented using a series of ASCII bytes, even though it can exist peacefully in a .NET string object.
See this article by Joel Spolsky for further reading.
You can use these functions for converting to binary and restore it back :
public static string BinaryToString(string data)
{
List<Byte> byteList = new List<Byte>();
for (int i = 0; i < data.Length; i += 8)
{
byteList.Add(Convert.ToByte(data.Substring(i, 8), 2));
}
return Encoding.ASCII.GetString(byteList.ToArray());
}
and for converting string to binary :
public static string StringToBinary(string data)
{
StringBuilder sb = new StringBuilder();
foreach (char c in data.ToCharArray())
{
sb.Append(Convert.ToString(c, 2).PadLeft(8, '0'));
}
return sb.ToString();
}
Hope Helps You.
First convert the string into bytes, as described in my comment and in Cameron's answer; then iterate, convert each byte into an 8-digit binary number (possibly with Convert.ToString, padding appropriately), then concatenate. For the reverse direction, split by 8 characters, run through Convert.ToInt16, build up a byte array, then convert back to a string with GetString.
I'm trying to figure out a way to parse out a base64 string from with a larger string.
I have the string "Hello <base64 content> World" and I want to be able to parse out the base64 content and convert it back to a string. "Hello Awesome World"
Answers in C# preferred.
Edit: Updated with a more real example.
--abcdef
\n
Content-Type: Text/Plain;
Content-Transfer-Encoding: base64
\n
<base64 content>
\n
--abcdef--
This is taken from 1 sample. The problem is that the Content.... vary quite a bit from one record to the next.
There is no reliable way to do it. How would you know that, for instance, "Hello" is not a base64 string ? OK, it's a bad example because base64 is supposed to be padded so that the length is a multiple of 4, but what about "overflow" ? It's 8-character long, it is a valid base64 string (it would decode to "¢÷«~Z0"), even though it's obviously a normal word to a human reader. There's just no way you can tell for sure whether a word is a normal word or base64 encoded text.
The fact that you have base64 encoded text embedded in normal text is clearly a design mistake, I suggest you do something about it rather that trying to do something impossible...
In short form you could:
split the string on any chars that are not valid base64 data or padding
try to convert each token
if the conversion succeeds, call replace on the original string to switch the token with the converted value
In code:
var delimiters = new char[] { /* non-base64 ASCII chars */ };
var possibles = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
//need to tweak to include padding chars in matches, but still split on padding?
//maybe better off creating a regex to match base64 + padding
//and using Regex.Split?
foreach(var match in possibles)
{
try
{
var converted = Convert.FromBase64String(match);
var text = System.Text.Encoding.UTF8.GetString(converted);
if(!string.IsNullOrEmpty(text))
{
value = value.Replace(match, text);
}
}
catch (System.ArgumentNullException)
{
//handle it
}
catch (System.FormatException)
{
//handle it
}
}
Without a delimiter though, you can end up converting non-base64 text that happens to be also be valid as base64 encoded text.
Looking at your example of trying to convert "Hello QXdlc29tZQ== World" to "Hello Awesome World" the above algorithm could easily generate something like "ée¡Ý•Í½µ”¢¹]" by trying to convert the whole string from base64 since there is no delimiter between plain and encoded text.
Update (based on comments):
If there are no '\n's in the base64 content and it is always preceded by "Content-Transfer-Encoding: base64\n", then there is a way:
split the string on '\n'
iterate over all the tokens until a token ends in "Content-Transfer-Encoding: base64"
the next token (if there are any) should be decoded (if possible) and then the replacement should be made in the original string
return to iterating until out of tokens
In code:
private string ConvertMixedUpTextAndBase64(string value)
{
var delimiters = new char[] { '\n' };
var possibles = value.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < possibles.Length - 1; i++)
{
if (possibles[i].EndsWith("Content-Transfer-Encoding: base64"))
{
var nextTokenPlain = DecodeBase64(possibles[i + 1]);
if (!string.IsNullOrEmpty(nextTokenPlain))
{
value = value.Replace(possibles[i + 1], nextTokenPlain);
i++;
}
}
}
return value;
}
private string DecodeBase64(string text)
{
string result = null;
try
{
var converted = Convert.FromBase64String(text);
result = System.Text.Encoding.UTF8.GetString(converted);
}
catch (System.ArgumentNullException)
{
//handle it
}
catch (System.FormatException)
{
//handle it
}
return result;
}