From a third part i get a string like this "123123"
Ill have to wrap it into some XML however i get this error System.ArgumentException: '', hexadecimal value 0x04, is an invalid character.
Can I either decode the hex value to something meaningful, or just delete it. The solution must be able to handle other hex values as well.
I ended up creating this method
public static string RemoveInvalidXmlChars(string str)
{
var sb = new StringBuilder();
var decodedString = HttpUtility.HtmlDecode(str);
foreach (var c in decodedString)
{
if (XmlConvert.IsXmlChar(c))
sb.Append(c);
}
return sb.ToString();
}
Related
I've ran into a curious behaviour when trying to hash a string password and then display the hash in console.
My code is:
static void Main(string[] args)
{
string password = "password";
ConvertPasswordToHash(password);
}
private static void ConvertPasswordToHash(string password)
{
using (HashAlgorithm sha = SHA256.Create())
{
byte[] result = sha.ComputeHash(Encoding.UTF8.GetBytes(password));
string hashText = Encoding.UTF8.GetString(result);
Console.WriteLine(hashText);
StringBuilder sb = new StringBuilder();
foreach (var item in result)
{
sb.Append((char)item);
}
Console.WriteLine(sb);
}
}
The problem is two fold:
The hashTest and sb contain different values (both are 32 characters long before outputting) and 2) Console outputs are even stranger. They are not 32 characters in length and second the outputs are slightly different:
When examining the strings before outputting them, I've noticed that hashText contains for instance \u0004, which could be a unicode character of some sort while sb does not contain that at all (that is before outputting the values into the console).
My questions are:
Which way is the correct way of getting a string of chars from the provided array of bytes?
Why are the console outputs different but only so slightly? It does not look like it is the fault of using the wrong Encoding.
How do I output the correct hash (32 symbols) into the console? Ive tried adding '#' before the strings to cancel any possible carriage returns etc... Pretty much without any result.
Maybe I am missing something obvious. Thank you.
The correct logic should be as follows:
private static void ConvertPasswordToHash(string password)
{
using (HashAlgorithm sha = SHA256.Create())
{
byte[] result = sha.ComputeHash(Encoding.UTF8.GetBytes(password));
StringBuilder sb = new StringBuilder();
foreach (var item in result)
{
sb.Append(item.ToString("x2"));
}
Console.WriteLine(sb);
}
}
ToString("x2") formats the string as two hexadecimal characters.
Live example: https://dotnetfiddle.net/QkREkX
Another way is just to represent your byte[] array as a base 64 string, no StringBuilder required.
byte[] result = sha.ComputeHash(Encoding.UTF8.GetBytes(password));
Console.WriteLine(Convert.ToBase64String(result));
Hi I'm trying to transform a string containing special characters like û and ….
In my research and tests I almost succeeded using the following function:
public static string ToHex(this string input)
{
char[] values = input.ToCharArray();
string hex = "0x";
string add = "";
foreach (char c in values)
{
int value = Convert.ToInt32(c);
add = String.Format("{0:X}", value).Length == 1 ?
"0" + String.Format("{0:X}", value) + "00"
: String.Format("{0:X}", value) + "00";
hex += add;
}
return hex;
}
If I try to decode ´o¸sçPQ^ûË\u000f±d it does it correctly and turns it into this 0xB4006F00B8007300E700500051005E00FB00CB000F00B1006400,
instead when I try to decode ´o¸sçPQ](ÂF\u0012…a it fails and turns it into 0xB4006F00B8007300E700500051005D002800C200460012002026006100 instead of this
0xB4006F00B8007300E700500051005D002800C2004600120026206100.
Making a minimum of debug I saw that the string is transformed from
´o¸sçPQ](ÂF\u0012…a to ´o¸sçPQ](ÂF.a, I wouldn't want that to be the problem but I'm not sure.
EDIT
0xB4006F00B8007300E700500051005D002800C2004600120026206100 ´o¸sçPQ](ÂF…a CORRECT
0xB4006F00B8007300E700500051005D002800C200460012002026006100 ´o¸sçPQ](ÂF.a MY OUTPUT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» CORRECT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» MY OUTPUT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na CORRECT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na MY OUTPUT
0xB4006F00B8007300E700500051005F001A20BC006B0021003500DD00 ´o¸sçPQ_‚¼k!5Ý CORRECT
0xB4006F00B8007300E700500051005F00201A00BC006B0021003500DD00 ´o¸sçPQ_'¼k!5Ý MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006B00290014204E004100 ´o¸sçPQ]/îk)—NA CORRECT
0xB4006F00B8007300E700500051005D002F00EE006B0029002014004E004100 ´o¸sçPQ]/îk)-NA MY OUTPUT
0xB4006F00B8007300E700500051005D003800E600690036001C204C004F00 ´o¸sçPQ]8æi6“LO CORRECT
0xB4006F00B8007300E700500051005D003800E60069003600201C004C004F00 ´o¸sçPQ]8æi6"LO MY OUTPUT
0xB4006F00B8007300E700500051005D002F00F3006200390014204E004700C602 ´o¸sçPQ]/ób9—NGˆ CORRECT
0xB4006F00B8007300E700500051005D002F00F300620039002014004E0047002C600 ´o¸sçPQ]/ób9-NG^ MY OUTPUT
0xB4006F00B8007300E700500051005D003B00EE007200330078014100 ´o¸sçPQ];îr3ŸA CORRECT
0xB4006F00B8007300E700500051005D003B00EE0072003300178004100 ´o¸sçPQ];îr3YA MY OUTPUT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>K CORRECT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>?K MY OUTPUT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> CORRECT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006A003000DC024500 ´o¸sçPQ]/îj0˜E CORRECT
0xB4006F00B8007300E700500051005D002F00EE006A0030002DC004500 ´o¸sçPQ]/îj0~E MY OUTPUT
I thank you in advance for every reply or comment,
greetings.
This is due to endianness, and different integer and string encodings.
char cc = '…';
Console.WriteLine(cc);
// 2026 <-- note, hex value differs from byte representation shown below
Console.WriteLine(((int)cc).ToString("x"));
// 26200000
Console.WriteLine(BytesToHex(BitConverter.GetBytes((int)cc)));
// 2620
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes(new[] { cc })));
You should not treat chars as integers. There are plenty of different ways to encode strings, .net internally uses UTF-16. And all encodings works with bytes, not with integers. Explicit conversion chars to integer can lead to unexpected results, like yours. Why don't you get encoding you need and work with bytes via Encoding.GetBytes?
void Main()
{
// output you expect 0xB4006F00B8007300E700500051005D002800C2004600120026206100
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes("´o¸sçPQ](ÂF\u0012…a")));
}
public static string BytesToHex(byte[] bytes)
{
// whatever way to convert bytes to hex
return "0x" + BitConverter.ToString(bytes).Replace("-", "");
}
Trying to convert a huge hex string to a binary string, but the OverflowException keeps gets thrown. This is my code to convert an image file to a hex string (which when used with a FlowDocument works perfectly!):
string h = new System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary(System.IO.File.ReadAllBytes(Path)).ToString();
Now, however, I want to take this hex string and convert it to a binary string so that it may also displayed in FlowDocument. First, I tried writing it to a temp text file and then attempt to read it into a byte array:
string TempPath = System.IO.Path.Combine(System.IO.Path.GetTempPath(), "Text.txt");
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(TempPath))
{
sw.WriteLine(Convert.ToString(Convert.ToInt64(h, 16), 2).PadLeft(12, '0'));
}
byte[] c = System.IO.File.ReadAllBytes(TempPath);
When that didn't work, I tried reading it into a string:
string c = System.IO.File.ReadAll(TempPath);
Neither worked and still throw OverflowException. I have also tried just doing this and skipped writing to a file altogether:
string s = Convert.ToString(Convert.ToInt64(h, 16), 2).PadLeft(12, '0')
And despite what approach I take, I still get an exception thrown. How are large strings like this normally handled?
Update
I've modified my algorithm to convert one character at a time, so now it looks like this:
string NewBinary = "";
try
{
int i = 0;
foreach (char c in h)
{
if (i == 100) break;
NewBinary = string.Concat(NewBinary, Convert.ToString(Convert.ToInt64(c.ToString(), 16), 2).PadLeft(12, '0'));
i++;
}
}
The problem with this is that the string is always going to be super long and the code above takes a LONG time to generate the binary string. I limited the length to 100 to test conversion, so the conversion itself is not an issue.
An int64 is represented by a 16 character hex string, which is why attempting to convert a "huge string" causes an OverflowException - the value is more than can be represented by an int64. You will need to break the string up into groups of max 16 chars & convert those to binary & concatenate them.
You could convert a nibble at a time using a lookup array, for example:
public static string HexStringToBinaryString(string hexString)
{
var result = new StringBuilder();
string[] lookup =
{
"0000", "0001", "0010", "0011",
"0100", "0101", "0110", "0111",
"1000", "1001", "1010", "1011",
"1100", "1101", "1110", "1111"
};
foreach (char nibble in hexString.Select(char.ToUpper))
result.Append((nibble > '9') ? lookup[10+nibble-'A'] : lookup[nibble-'0']);
return result.ToString();
}
Convert each hex character of the string into its corresponding binary pattern (eg A becomes 1010 etc)
I am working on a project in Unity which uses Assembly C#. I try to get special character such as é, but in the console it just displays a blank character: "". For instance translating "How are you?" Should return "Cómo Estás?", but it returns "Cmo Ests". I put the return string "Cmo Ests" in a character array and realized that it is a non-null blank character. I am using Encoding.UTF8, and when I do:
char ch = '\u00e9';
print (ch);
It will print "é". I have tried getting the bytes off of a given string using:
byte[] utf8bytes = System.Text.Encoding.UTF8.GetBytes(temp);
While translating "How are you?", it will return a byte string, but for the special characters such as é, I get the series of bytes 239, 191, 189, which is a replacement character.
What type of information do I need to retrieve from the characters in order to accurately determining what character it is? Do I need to do something with the information that Google gives me, or is it something else? I am need a general case that I can place in my program and will work for any input string. If anyone can help, it would be greatly appreciated.
Here is the code that is referenced:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using UnityEngine;
using System.Collections;
using System.Net;
using HtmlAgilityPack;
public class Dictionary{
string[] formatParams;
HtmlDocument doc;
string returnString;
char[] letters;
public char[] charString;
public Dictionary(){
formatParams = new string[2];
doc = new HtmlDocument();
returnString = "";
}
public string Translate(String input, String languagePair, Encoding encoding)
{
formatParams[0]= input;
formatParams[1]= languagePair;
string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", formatParams);
string result = String.Empty;
using (WebClient webClient = new WebClient())
{
webClient.Encoding = encoding;
result = webClient.DownloadString(url);
}
doc.LoadHtml(result);
input = alter (input);
string temp = doc.DocumentNode.SelectSingleNode("//span[#title='"+input+"']").InnerText;
charString = temp.ToCharArray();
return temp;
}
// Use this for initialization
void Start () {
}
string alter(string inputString){
returnString = "";
letters = inputString.ToCharArray();
for(int i=0; i<inputString.Length;i++){
if(letters[i]=='\''){
returnString = returnString + "'";
}else{
returnString = returnString + letters[i];
}
}
return returnString;
}
}
Maybe you should use another API/URL. This function below uses a different url that returns JSON data and seems to work better:
public static string Translate(string input, string fromLanguage, string toLanguage)
{
using (WebClient webClient = new WebClient())
{
string url = string.Format("http://translate.google.com/translate_a/t?client=j&text={0}&sl={1}&tl={2}", Uri.EscapeUriString(input), fromLanguage, toLanguage);
string result = webClient.DownloadString(url);
// I used JavaScriptSerializer but another JSON parser would work
JavaScriptSerializer serializer = new JavaScriptSerializer();
Dictionary<string, object> dic = (Dictionary<string, object>)serializer.DeserializeObject(result);
Dictionary<string, object> sentences = (Dictionary<string, object>)((object[])dic["sentences"])[0];
return (string)sentences["trans"];
}
}
If I run this in a Console App:
Console.WriteLine(Translate("How are you?", "en", "es"));
It will display
¿Cómo estás?
I don't know much about the GoogleTranslate API, but my first thought is that you've got a Unicode Normalization problem.
Have a look at System.String.Normalize() and it's friends.
Unicode is very complicated, so I'll over simplify! Many symbols can be represented in different ways in Unicode, that is: 'é' could be represented as 'é' (one character), or as an 'e' + 'accent character' (two characters), or, depending what comes back from the API, something else altogether.
The Normalize function will convert your string to one with the same Textual meaning, but potentially a different binary value which may fix your output problem.
You actually pretty much have it. Just insert the coded letter with a \u and it works.
string mystr = "C\u00f3mo Est\u00e1s?";
There are several issues with your approach. First of all the UTF8 encoding is a multibyte encoding. This means that if you use any non-ASCII character (having char code > 127), you will get a series of special characters that indicate to the system that this is an Unicode char. So actually your sequence 239, 191, 189 indicates a single character which is not an ASCII character. If you use UTF16, then you get fixed-size encodings (2-byte encodings) which actually map a character to an unsigned short (0-65535).
The char type in c# is a two-byte type, so it is actually an unsigned short. This contrasts with other languages, such as C/C++ where the char type is a 1-byte type.
So in your case, unless you really need to be using byte[] arrays, you should use char[] arrays. Or if you want to encode the characters so that they can be used in HTML, then you can just iterate through the characters and check if the character code is > 128, then you can replace it with the &hex; character code.
I had the same problem working one of my project [Language Resource Localization Translation]
I was doing the same thing and was using.. System.Text.Encoding.UTF8.GetBytes() and because of utf8 encoding was receiving special characters like your
e.g 239, 191, 189 in result string.
please take a look of my solution... hope this helps
Don't Use encoding at all Google translation will return correct like á as it self in the string. do some string manipulation and read the string as it is...
Generic Solution [works for every language translation which google support]
try
{
//Don't use UtF Encoding
// use default webclient encoding
var url = String.Format("http://www.google.com/translate_t?hl=en&text={0}&langpair={1}", "►" + txtNewResourceValue.Text.Trim() + "◄", "en|" + item.Text.Substring(0, 2));
var webClient = new WebClient();
string result = webClient.DownloadString(url); //get all data from google translate in UTF8 coding..
int start = result.IndexOf("id=result_box");
int end = result.IndexOf("id=spell-place-holder");
int length = end - start;
result = result.Substring(start, length);
result = reverseString(result);
start = result.IndexOf(";8669#&");//◄
end = result.IndexOf(";8569#&"); //►
length = end - start;
result = result.Substring(start +7 , length - 8);
objDic2.Text = reverseString(result);
//hard code substring; finding the correct translation within the string.
dictList.Add(objDic2);
}
catch (Exception ex)
{
lblMessages.InnerHtml = "<strong>Google translate exception occured no resource saved..." + ex.Message + "</strong>";
error = true;
}
public static string reverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
as you can see from the code no encoding has been performed and i am sending 2 special key charachters as "►" + txtNewResourceValue.Text.Trim() + "◄"to determine the start and end of the return translation from google.
Also i have checked hough my language utility tool I am getting "Cómo Estás?" when sending
How are you to google translation... :)
Best regards
[Shaz]
---------------------------Edited-------------------------
public string Translate(String input, String languagePair)
{
try
{
//Don't use UtF Encoding
// use default webclient encoding
//input [string to translate]
//Languagepair [eg|es]
var url = String.Format("http://www.google.com/translate_t?hl=en&text={0}&langpair={1}", "►" + input.Trim() + "◄", languagePair);
var webClient = new WebClient();
string result = webClient.DownloadString(url); //get all data from google translate
int start = result.IndexOf("id=result_box");
int end = result.IndexOf("id=spell-place-holder");
int length = end - start;
result = result.Substring(start, length);
result = reverseString(result);
start = result.IndexOf(";8669#&");//◄
end = result.IndexOf(";8569#&"); //►
length = end - start;
result = result.Substring(start + 7, length - 8);
//return transalted string
return reverseString(result);
}
catch (Exception ex)
{
return "Google translate exception occured no resource saved..." + ex.Message";
}
}
I'm trying to figure out a way to parse out a base64 string from with a larger string.
I have the string "Hello <base64 content> World" and I want to be able to parse out the base64 content and convert it back to a string. "Hello Awesome World"
Answers in C# preferred.
Edit: Updated with a more real example.
--abcdef
\n
Content-Type: Text/Plain;
Content-Transfer-Encoding: base64
\n
<base64 content>
\n
--abcdef--
This is taken from 1 sample. The problem is that the Content.... vary quite a bit from one record to the next.
There is no reliable way to do it. How would you know that, for instance, "Hello" is not a base64 string ? OK, it's a bad example because base64 is supposed to be padded so that the length is a multiple of 4, but what about "overflow" ? It's 8-character long, it is a valid base64 string (it would decode to "¢÷«~Z0"), even though it's obviously a normal word to a human reader. There's just no way you can tell for sure whether a word is a normal word or base64 encoded text.
The fact that you have base64 encoded text embedded in normal text is clearly a design mistake, I suggest you do something about it rather that trying to do something impossible...
In short form you could:
split the string on any chars that are not valid base64 data or padding
try to convert each token
if the conversion succeeds, call replace on the original string to switch the token with the converted value
In code:
var delimiters = new char[] { /* non-base64 ASCII chars */ };
var possibles = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
//need to tweak to include padding chars in matches, but still split on padding?
//maybe better off creating a regex to match base64 + padding
//and using Regex.Split?
foreach(var match in possibles)
{
try
{
var converted = Convert.FromBase64String(match);
var text = System.Text.Encoding.UTF8.GetString(converted);
if(!string.IsNullOrEmpty(text))
{
value = value.Replace(match, text);
}
}
catch (System.ArgumentNullException)
{
//handle it
}
catch (System.FormatException)
{
//handle it
}
}
Without a delimiter though, you can end up converting non-base64 text that happens to be also be valid as base64 encoded text.
Looking at your example of trying to convert "Hello QXdlc29tZQ== World" to "Hello Awesome World" the above algorithm could easily generate something like "ée¡Ý•Í½µ”¢¹]" by trying to convert the whole string from base64 since there is no delimiter between plain and encoded text.
Update (based on comments):
If there are no '\n's in the base64 content and it is always preceded by "Content-Transfer-Encoding: base64\n", then there is a way:
split the string on '\n'
iterate over all the tokens until a token ends in "Content-Transfer-Encoding: base64"
the next token (if there are any) should be decoded (if possible) and then the replacement should be made in the original string
return to iterating until out of tokens
In code:
private string ConvertMixedUpTextAndBase64(string value)
{
var delimiters = new char[] { '\n' };
var possibles = value.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < possibles.Length - 1; i++)
{
if (possibles[i].EndsWith("Content-Transfer-Encoding: base64"))
{
var nextTokenPlain = DecodeBase64(possibles[i + 1]);
if (!string.IsNullOrEmpty(nextTokenPlain))
{
value = value.Replace(possibles[i + 1], nextTokenPlain);
i++;
}
}
}
return value;
}
private string DecodeBase64(string text)
{
string result = null;
try
{
var converted = Convert.FromBase64String(text);
result = System.Text.Encoding.UTF8.GetString(converted);
}
catch (System.ArgumentNullException)
{
//handle it
}
catch (System.FormatException)
{
//handle it
}
return result;
}