Can't fully correct encoding issue from website [duplicate]

Can't fully correct encoding issue from website [duplicate] - c#

I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.
Due to incorrect encoding, a piece of my string looks like this in Spanish:
AcciÃ³n
whereas it should look like this:
Acción
According to the answer on this question:
How to know string encoding in C#, the encoding I am receiving should be coming on UTF-8 already, but it is read on Encoding.Default (probably ANSI?).
I am trying to transform this string into real UTF-8, but one of the problems is that I can only see a subset of the Encoding class (UTF8 and Unicode properties only), probably because I'm limited to the windows surface API.
I have tried some snippets I've found on the internet, but none of them have proved successful so far for eastern languages (i.e. korean). One example is as follows:
var utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(myString);
myString= utf8.GetString(utfBytes, 0, utfBytes.Length);
I also tried extracting the string into a byte array and then using UTF8.GetString:
byte[] myByteArray = new byte[myString.Length];
for (int ix = 0; ix < myString.Length; ++ix)
{
char ch = myString[ix];
myByteArray[ix] = (byte) ch;
}
myString = Encoding.UTF8.GetString(myByteArray, 0, myString.Length);
Do you guys have any other ideas that I could try?

As you know the string is coming in as Encoding.Default you could simply use:
byte[] bytes = Encoding.Default.GetBytes(myString);
myString = Encoding.UTF8.GetString(bytes);
Another thing you may have to remember: If you are using Console.WriteLine to output some strings, then you should also write Console.OutputEncoding = System.Text.Encoding.UTF8;!!! Or all utf8 strings will be outputed as gbk...

string utf8String = "AcciÃ³n";
string propEncodeString = string.Empty;
byte[] utf8_Bytes = new byte[utf8String.Length];
for (int i = 0; i < utf8String.Length; ++i)
{
utf8_Bytes[i] = (byte)utf8String[i];
}
propEncodeString = Encoding.UTF8.GetString(utf8_Bytes, 0, utf8_Bytes.Length);
Output should look like
Acción
dayâ€™s displays
day's
call DecodeFromUtf8();
private static void DecodeFromUtf8()
{
string utf8_String = "dayâ€™s";
byte[] bytes = Encoding.Default.GetBytes(utf8_String);
utf8_String = Encoding.UTF8.GetString(bytes);
}

Your code is reading a sequence of UTF8-encoded bytes, and decoding them using an 8-bit encoding.
You need to fix that code to decode the bytes as UTF8.
Alternatively (not ideal), you could convert the bad string back to the original byte array—by encoding it using the incorrect encoding—then re-decode the bytes as UTF8.

Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(mystring));

#anothershrubery answer worked for me. I've made an enhancement using StringEntensions Class so I can easily convert any string at all in my program.
Method:
public static class StringExtensions
{
public static string ToUTF8(this string text)
{
return Encoding.UTF8.GetString(Encoding.Default.GetBytes(text));
}
}
Usage:
string myString = "AcciÃ³n";
string strConverted = myString.ToUTF8();
Or simply:
string strConverted = "AcciÃ³n".ToUTF8();

If you want to save any string to mysql database do this:->
Your database field structure i phpmyadmin [ or any other control panel] should set to utf8-gerneral-ci
2) you should change your string [Ex. textbox1.text] to byte, therefor
2-1) define byte[] st2;
2-2) convert your string [textbox1.text] to unicode [ mmultibyte string] by :
byte[] st2 = System.Text.Encoding.UTF8.GetBytes(textBox1.Text);
3) execute this sql command before any query:
string mysql_query2 = "SET NAMES 'utf8'";
cmd.CommandText = mysql_query2;
cmd.ExecuteNonQuery();
3-2) now you should insert this value in to for example name field by :
cmd.CommandText = "INSERT INTO customer (`name`) values (#name)";
4) the main job that many solution didn't attention to it is the below line:
you should use addwithvalue instead of add in command parameter like below:
cmd.Parameters.AddWithValue("#name",ut);
++++++++++++++++++++++++++++++++++
enjoy real data in your database server instead of ????

Use the below code snippet to get bytes from csv file
protected byte[] GetCSVFileContent(string fileName)
{
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(fileName, Encoding.Default, true))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
sb.AppendLine(line);
}
}
string allines = sb.ToString();
UTF8Encoding utf8 = new UTF8Encoding();
var preamble = utf8.GetPreamble();
var data = utf8.GetBytes(allines);
return data;
}
Call the below and save it as an attachment
Encoding csvEncoding = Encoding.UTF8;
//byte[] csvFile = GetCSVFileContent(FileUpload1.PostedFile.FileName);
byte[] csvFile = GetCSVFileContent("Your_CSV_File_NAme");
string attachment = String.Format("attachment; filename={0}.csv", "uomEncoded");
Response.Clear();
Response.ClearHeaders();
Response.ClearContent();
Response.ContentType = "text/csv";
Response.ContentEncoding = csvEncoding;
Response.AppendHeader("Content-Disposition", attachment);
//Response.BinaryWrite(csvEncoding.GetPreamble());
Response.BinaryWrite(csvFile);
Response.Flush();
Response.End();

Related

How can I transform string to UTF-8 in C#?

I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.
Due to incorrect encoding, a piece of my string looks like this in Spanish:
AcciÃ³n
whereas it should look like this:
Acción
According to the answer on this question:
How to know string encoding in C#, the encoding I am receiving should be coming on UTF-8 already, but it is read on Encoding.Default (probably ANSI?).
I am trying to transform this string into real UTF-8, but one of the problems is that I can only see a subset of the Encoding class (UTF8 and Unicode properties only), probably because I'm limited to the windows surface API.
I have tried some snippets I've found on the internet, but none of them have proved successful so far for eastern languages (i.e. korean). One example is as follows:
var utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(myString);
myString= utf8.GetString(utfBytes, 0, utfBytes.Length);
I also tried extracting the string into a byte array and then using UTF8.GetString:
byte[] myByteArray = new byte[myString.Length];
for (int ix = 0; ix < myString.Length; ++ix)
{
char ch = myString[ix];
myByteArray[ix] = (byte) ch;
}
myString = Encoding.UTF8.GetString(myByteArray, 0, myString.Length);
Do you guys have any other ideas that I could try?

As you know the string is coming in as Encoding.Default you could simply use:
byte[] bytes = Encoding.Default.GetBytes(myString);
myString = Encoding.UTF8.GetString(bytes);
Another thing you may have to remember: If you are using Console.WriteLine to output some strings, then you should also write Console.OutputEncoding = System.Text.Encoding.UTF8;!!! Or all utf8 strings will be outputed as gbk...

string utf8String = "AcciÃ³n";
string propEncodeString = string.Empty;
byte[] utf8_Bytes = new byte[utf8String.Length];
for (int i = 0; i < utf8String.Length; ++i)
{
utf8_Bytes[i] = (byte)utf8String[i];
}
propEncodeString = Encoding.UTF8.GetString(utf8_Bytes, 0, utf8_Bytes.Length);
Output should look like
Acción
dayâ€™s displays
day's
call DecodeFromUtf8();
private static void DecodeFromUtf8()
{
string utf8_String = "dayâ€™s";
byte[] bytes = Encoding.Default.GetBytes(utf8_String);
utf8_String = Encoding.UTF8.GetString(bytes);
}

Your code is reading a sequence of UTF8-encoded bytes, and decoding them using an 8-bit encoding.
You need to fix that code to decode the bytes as UTF8.
Alternatively (not ideal), you could convert the bad string back to the original byte array—by encoding it using the incorrect encoding—then re-decode the bytes as UTF8.

Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(mystring));

#anothershrubery answer worked for me. I've made an enhancement using StringEntensions Class so I can easily convert any string at all in my program.
Method:
public static class StringExtensions
{
public static string ToUTF8(this string text)
{
return Encoding.UTF8.GetString(Encoding.Default.GetBytes(text));
}
}
Usage:
string myString = "AcciÃ³n";
string strConverted = myString.ToUTF8();
Or simply:
string strConverted = "AcciÃ³n".ToUTF8();

If you want to save any string to mysql database do this:->
Your database field structure i phpmyadmin [ or any other control panel] should set to utf8-gerneral-ci
2) you should change your string [Ex. textbox1.text] to byte, therefor
2-1) define byte[] st2;
2-2) convert your string [textbox1.text] to unicode [ mmultibyte string] by :
byte[] st2 = System.Text.Encoding.UTF8.GetBytes(textBox1.Text);
3) execute this sql command before any query:
string mysql_query2 = "SET NAMES 'utf8'";
cmd.CommandText = mysql_query2;
cmd.ExecuteNonQuery();
3-2) now you should insert this value in to for example name field by :
cmd.CommandText = "INSERT INTO customer (`name`) values (#name)";
4) the main job that many solution didn't attention to it is the below line:
you should use addwithvalue instead of add in command parameter like below:
cmd.Parameters.AddWithValue("#name",ut);
++++++++++++++++++++++++++++++++++
enjoy real data in your database server instead of ????

Use the below code snippet to get bytes from csv file
protected byte[] GetCSVFileContent(string fileName)
{
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(fileName, Encoding.Default, true))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
sb.AppendLine(line);
}
}
string allines = sb.ToString();
UTF8Encoding utf8 = new UTF8Encoding();
var preamble = utf8.GetPreamble();
var data = utf8.GetBytes(allines);
return data;
}
Call the below and save it as an attachment
Encoding csvEncoding = Encoding.UTF8;
//byte[] csvFile = GetCSVFileContent(FileUpload1.PostedFile.FileName);
byte[] csvFile = GetCSVFileContent("Your_CSV_File_NAme");
string attachment = String.Format("attachment; filename={0}.csv", "uomEncoded");
Response.Clear();
Response.ClearHeaders();
Response.ClearContent();
Response.ContentType = "text/csv";
Response.ContentEncoding = csvEncoding;
Response.AppendHeader("Content-Disposition", attachment);
//Response.BinaryWrite(csvEncoding.GetPreamble());
Response.BinaryWrite(csvFile);
Response.Flush();
Response.End();

C# UTF8 Decoding, returning bytes/numbers instead of string

I've having an issue decoding a file using an UTF8Encoder.
I am reading text from a file which I have encoded with UTF8 (String > Byte)
See the following method.
public static void Encode(string Path)
{
string text;
Byte[] bytes;
using (StreamReader sr = new StreamReader(Path))
{
text = sr.ReadToEnd();
UTF8Encoding Encoding = new UTF8Encoding();
bytes = Encoding.GetBytes(text);
sr.Close();
}
using (StreamWriter sw = new StreamWriter(Path))
{
foreach (byte b in bytes)
sw.Write(b.ToString());
sw.Close();
}
}
I then decode it using the method
public static String Decode(string Path)
{
String text;
Byte[] bytes;
using (StreamReader sr = new StreamReader(Path))
{
text = sr.ReadToEnd();
UTF8Encoding Encoding = new UTF8Encoding();
bytes = Encoding.GetBytes(text);
text = Encoding.GetString(bytes);
return text;
}
}
But instead of decoding the byte to have it come back to text, it just returns it as a string of numbers. I can't see what I am doing wrong as I don't really have much experience with this.
EDIT: To clarify what I'm trying to achieve. I'm trying to have a text file save the text as bytes, rather than chars/numbers. This is to provide a very simple encryption to the files, that so you can't modify them, unless you know what you're doing. The Decode function is then used to read the text (bytes) from the file and make them in to readable text. I hope this clarified what I'm trying to achieve.
PS: Sry for no comments, but I think it's short enough to be understandable

What exactly are you trying to achieve? UTF-8 (and all other Encodings) is a method to converting strings to byte arrays (text to raw data) and vice versa. StreamReader and StreamWriter are used to read/write strings from/to files. No need to re-encode anything there. Just using reader.ReadToEnd() will return the correct string.
Your piece of code seems to attempt to write a file containing a list of numbers (as a readable, textual representation) corresponding to UTF-8 bytes of the given text. OK. Even though this is very strange idea (I hope you are not trying to do anything like “encryption” with that.), this is definitely possible, if that’s really what you want to do. But you need to separate the readable numbers somehow, e.g. by newlines, and parse it when reading them back:
public static void Encode(string path)
{
byte[] bytes;
using (var sr = new StreamReader(path))
{
var text = sr.ReadToEnd();
bytes = Encoding.UTF8.GetBytes(text);
}
using (var sw = new StreamWriter(path))
{
foreach (byte b in bytes)
{
sw.WriteLine(b);
}
}
}
public static void Decode(string path)
{
var data = new List<byte>();
using (var sr = new StreamReader(path))
{
string line;
while((line = sr.ReadLine()) != null)
data.Add(Byte.Parse(line));
}
using (var sw = new StreamWriter(path))
{
sw.Write(Encoding.UTF8.GetString(data.ToArray()));
}
}

This code will decode encrypted string to text, it worked on my side.
public static String Decode(string Path)
{
String text;
using (StreamReader sr = new StreamReader(Path))
{
text = st.ReadToEnd();
byte[] bytes = Convert.FromBase64String(text);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder decoder = encoder.GetDecoder();
int count = decoder.GetCharCount(bytes, 0, bytes.Length);
char[] arr = new char[count];
decoder.GetChars(bytes, 0, bytes.Length, arr, 0);
text= new string(arr);
return text;
}
}

The StreamReader class will handle decoding for you, so your Decode() method can be as simple as this:
public static string Decode(string path)
{
// This StreamReader constructor defaults to UTF-8
using (StreamReader reader = new StreamReader(path))
return reader.ReadToEnd();
}
I'm not sure what your Encode() method is supposed to do, since the intent seems to be to read a file as UTF-8 and then write the text back to the exact same file as UTF-8. Something like this might make more sense:
public static void Encode(string path, string text)
{
// This StreamWriter constructor defaults to UTF-8
using (StreamWriter writer = new StreamWriter(path))
writer.Write(text);
}

Is there anything wrong with this RC4 encryption code in C#

I am trying to listen to the Foxycart XML Datafeed in C# and running into an issue which boils down to encryption.
In short, they send over their data as encoded and encrypted XML using RC4 encryption.
To test, they have some (user submitted) sample code to test this with C#. I tried using this sample RC4 decryption code provided by one of the users but it doesn't seem to work and their support staff thinks its down with the C# RC4 algorithm. Since they are not C# experts, i figured i would ask here. Here is the post on the FoxyCart forum
Anyway, here is the code that (tries to) simulate the response by encrypting an XML file and posting it to a URL (NOTE that DataFeedKey is a string that i have stored as a member variable):
public ActionResult TestDataFeed()
{
string transactionData = (new StreamReader(#"D:\SampleFeed.xml")).ReadToEnd();
string encryptedTransactionData = RC4.Encrypt(DataFeedKey, transactionData, false);
string encodedTransactionData = HttpUtility.UrlEncode(encryptedTransactionData, Encoding.GetEncoding(1252));
string postData = "FoxyData=" + encodedTransactionData;
var req = (HttpWebRequest)WebRequest.Create("http://localhost:3396/FoxyCart/RecieveDataFeed");
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";
var sw = new StreamWriter(req.GetRequestStream(), Encoding.ASCII);
sw.Write(postData);
sw.Close();
HttpWebResponse resp = null;
try
{
resp = (HttpWebResponse)req.GetResponse();
string r = new StreamReader(resp.GetResponseStream()).ReadToEnd();
}
catch (WebException ex)
{
string err = new StreamReader(ex.Response.GetResponseStream()).ReadToEnd();
}
return null;
}
and here is the callback method that receives the response.
[ValidateInput(false)]
public ActionResult RecieveDataFeed(FormCollection collection)
{
string unencodedFeed = HttpUtility.UrlDecode(collection["FoxyData"], Encoding.GetEncoding(1252));
string transaction = RC4.Decrypt(DataFeedKey, unencodedFeed, false);
return Content("foxy");
}
Instead of posting the whole RC4 class inline in this question, here is a link to the code of this RC4 class.
As i posted in the above link at the top of the question, the issue is when i check the variable transaction inside the
RecieveDataFeed()
method, i should have the regular XML back but instead i see this:
É?xø´ v´“Û·8êUŸí¥MïSÅJÖó5Cå7ã…ÄlÞ&þòG·¶ÝÙ3<ÍÖ¡«úüF¿¿ßìNµ>4¦Äu÷¼Â;£-w¤ƒûÊyL¹®½èíYö½’é(µJŒ~»»=3¼]F‡•=±Ùí]'é³«"øPç{Ù^yyéå–°ñ…5ðWF$zÉnÄ^_”Xë’ï%œ-5á
ÒÛ€jŠt`Â9œÇÞLU&¼~ç2îžúo/¢¶5,º*öOqÝ—‘.ó®šuf™å5G—õC®‰ÁéiÇúW®¦ÝÚ•Z±:„Q\p"p
ôÔiÛ!\D"ÉÂX3]ƒ°è€Œ«DQE‡kÝ#àö`gpöŽ÷nÛ={µÏßKQKüå(ö%¯¯Ü–9}¨¬°£7yo,«”ÜëCÍ/+…†ÕËî‘‹‰AÚmÇÙå©&©¡xÙkŒföX¯ÃX&×°S|kÜ6Ô°Üú\Ätóü-äUÆ†ÈáÅ\ ’E8‚¤âÈ4Ž¾«ãÎš_Sï£y‰xJº•bm*jo›‰ÜW–[ô†ÆJÐà$½…9½šžˆ_ÙÜù/®öÁVhzŠ¥ú(ñ£²6ˆb6¢ëße¤oáIðZuK}ÆÙ]"T¼*åZêñß5K—½òQSåRN Çë'Å¡
Ã•yiÈX •bØðIk¿WxwNàäx®‹?cv+X™¥E!gd4â¤nÔ‹¢½Ð”ªÊQ!‚.e8s
Gyª4¼ò,}Yœ‚¹”±E‡Jy}Sæ
ƒ¦ýK'Ð}~B¦E3!0°ú´A–5Þ³£9$–8äÏ©?
œ‡8GÂø
The code looks right:
Encrypt
Encode
Decode
Decrypt
but it doesn't seem to be working. Any suggestions on what might be wrong above?

I am a bit surprised by the code in the CR4 class. I can't see how it would work reliably.
The code uses windows-1252 encoding to encode characters into bytes, then it encrypts the bytes and tries to decode the bytes into characters. That won't work reliably, as you can only decode bytes that comes from encoding characters.
The method takes a string and returns a string, but it should take a byte array and return a byte array, similar to how all the encryption classes in the framework does it.
Here is a version that works like that:
public class RC4 {
public static byte[] Encrypt(byte[] pwd, byte[] data) {
int a, i, j, k, tmp;
int[] key, box;
byte[] cipher;
key = new int[256];
box = new int[256];
cipher = new byte[data.Length];
for (i = 0; i < 256; i++) {
key[i] = pwd[i % pwd.Length];
box[i] = i;
}
for (j = i = 0; i < 256; i++) {
j = (j + box[i] + key[i]) % 256;
tmp = box[i];
box[i] = box[j];
box[j] = tmp;
}
for (a = j = i = 0; i < data.Length; i++) {
a++;
a %= 256;
j += box[a];
j %= 256;
tmp = box[a];
box[a] = box[j];
box[j] = tmp;
k = box[((box[a] + box[j]) % 256)];
cipher[i] = (byte)(data[i] ^ k);
}
return cipher;
}
public static byte[] Decrypt(byte[] pwd, byte[] data) {
return Encrypt(pwd, data);
}
}
Example:
string data = "This is a test.";
byte[] key = { 1, 2, 3, 4, 5 };
// encrypt
byte[] enc = RC4.Encrypt(key, Encoding.UTF8.GetBytes(data));
// turn into base64 for convenient transport as form data
string base64 = Convert.ToBase64String(enc);
Console.WriteLine(base64);
// turn back into byte array
byte[] code = Convert.FromBase64String(base64);
// decrypt
string dec = Encoding.UTF8.GetString(RC4.Decrypt(key, code));
Console.WriteLine(dec);
Output:
5lEKdtBUswet4yYveWU2
This is a test.

Although this is more shooting in the dark... I am rather certain that the class implementing RC4 looks like it is assuming everyting is either ASCII or CodePage 1252 - both is wrong because I assume that the XML supplied is UTF-8 and .NET string represantion in memory is UTF16...
If my assumption is right the data is already scrambled when you get it back from encryption...
EDIT - some links to working RC4 code in C#:
http://tofuculture.com/Blog/post/RC4-Encryption-in-C.aspx
http://dotnet-snippets.com/dns/rc4-encryption-SID577.aspx
http://www.codeproject.com/KB/recipes/rc4csharp.aspx
http://icodesnip.com/snippet/csharp/rc4-encryption-code-snippets

Unicode-to-string conversion in C#

How can I convert a Unicode value to its equivalent string?
For example, I have "రమెశ్", and I need a function that accepts this Unicode value and returns a string.
I was looking at the System.Text.Encoding.Convert() function, but that does not take in a Unicode value; it takes two encodings and a byte array.
I bascially have a byte array that I need to save in a string field and then come back later and convert the string first back to a byte array.
So I use ByteConverter.GetString(byteArray) to save the byte array to a string, but I can't get it back to a byte array.

Use .ToString();:
this.Text = ((char)0x00D7).ToString();

Try the following:
byte[] bytes = ...;
string convertedUtf8 = Encoding.UTF8.GetString(bytes);
string convertedUtf16 = Encoding.Unicode.GetString(bytes); // For UTF-16
The other way around is using `GetBytes():
byte[] bytesUtf8 = Encoding.UTF8.GetBytes(convertedUtf8);
byte[] bytesUtf16 = Encoding.Unicode.GetBytes(convertedUtf16);
In the Encoding class, there are more variants if you need them.

To convert a string to a Unicode string, do it like this: very simple... note the BytesToString function which avoids using any inbuilt conversion stuff. Fast, too.
private string BytesToString(byte[] Bytes)
{
MemoryStream MS = new MemoryStream(Bytes);
StreamReader SR = new StreamReader(MS);
string S = SR.ReadToEnd();
SR.Close();
return S;
}
private string ToUnicode(string S)
{
return BytesToString(new UnicodeEncoding().GetBytes(S));
}

UTF8Encoding Class
UTF8Encoding uni = new UTF8Encoding();
Console.WriteLine( uni.GetString(new byte[] { 1, 2 }));

There are different types of encoding. You can try some of them to see if your bytestream get converted correctly:
System.Text.ASCIIEncoding encodingASCII = new System.Text.ASCIIEncoding();
System.Text.UTF8Encoding encodingUTF8 = new System.Text.UTF8Encoding();
System.Text.UnicodeEncoding encodingUNICODE = new System.Text.UnicodeEncoding();
var ascii = string.Format("{0}: {1}", encodingASCII.ToString(), encodingASCII.GetString(textBytesASCII));
var utf = string.Format("{0}: {1}", encodingUTF8.ToString(), encodingUTF8.GetString(textBytesUTF8));
var unicode = string.Format("{0}: {1}", encodingUNICODE.ToString(), encodingUNICODE.GetString(textBytesCyrillic));
Have a look here as well: http://george2giga.com/2010/10/08/c-text-encoding-and-transcoding/.

var ascii = $"{new ASCIIEncoding().ToString()}: {((ASCIIEncoding)new ASCIIEncoding()).GetString(textBytesASCII)}";
var utf = $"{new UTF8Encoding().ToString()}: {((UTF8Encoding)new UTF8Encoding()).GetString(textBytesUTF8)}";
var unicode = $"{new UnicodeEncoding().ToString()}: {((UnicodeEncoding)new UnicodeEncoding()).GetString(textBytesCyrillic)}";

Wrote a cycle for converting unicode symbols in string to UTF8 letters:
string stringWithUnicodeSymbols = #"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, #"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
try
{
if (s.Length == 4)
{
var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
outString += decoded;
}
else
{
outString += s;
}
}
catch (Exception e)
{
outString += s;
}
}

C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

I have googled on this topic and I have looked at every answer, but I still don't get it.
Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code:
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
string msg = iso.GetString(utf8.GetBytes(Message));
My source string is
Message = "ÄäÖöÕõÜü"
But unfortunately my result string becomes
msg = "Ã?Ã¤Ã?Ã¶Ã?ÃµÃ?Ã¼
What I'm doing wrong here?

Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);

I think your problem is that you assume that the bytes that represent the utf8 string will result in the same string when interpreted as something else (iso-8859-1). And that is simply just not the case. I recommend that you read this excellent article by Joel spolsky.

Try this:
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8,iso,utfBytes);
string msg = iso.GetString(isoBytes);

You need to fix the source of the string in the first place.
A string in .NET is actually just an array of 16-bit unicode code-points, characters, so a string isn't in any particular encoding.
It's when you take that string and convert it to a set of bytes that encoding comes into play.
In any case, the way you did it, encoded a string to a byte array with one character set, and then decoding it with another, will not work, as you see.
Can you tell us more about where that original string comes from, and why you think it has been encoded wrong?

Seems bit strange code. To get string from Utf8 byte stream all you need to do is:
string str = Encoding.UTF8.GetString(utf8ByteArray);
If you need to save iso-8859-1 byte stream to somewhere then just use:
additional line of code for previous:
byte[] iso88591data = Encoding.GetEncoding("ISO-8859-1").GetBytes(str);

Maybe it can help
Convert one codepage to another:
public static string fnStringConverterCodepage(string sText, string sCodepageIn = "ISO-8859-8", string sCodepageOut="ISO-8859-8")
{
string sResultado = string.Empty;
try
{
byte[] tempBytes;
tempBytes = System.Text.Encoding.GetEncoding(sCodepageIn).GetBytes(sText);
sResultado = System.Text.Encoding.GetEncoding(sCodepageOut).GetString(tempBytes);
}
catch (Exception)
{
sResultado = "";
}
return sResultado;
}
Usage:
string sMsg = "ERRO: NÃ£o foi possivel acessar o servico de AutenticaÃ§Ã£o";
var sOut = fnStringConverterCodepage(sMsg ,"ISO-8859-1","UTF-8"));
Output:
"Não foi possivel acessar o servico de Autenticação"

Encoding targetEncoding = Encoding.GetEncoding(1252);
// Encode a string into an array of bytes.
Byte[] encodedBytes = targetEncoding.GetBytes(utfString);
// Show the encoded byte values.
Console.WriteLine("Encoded bytes: " + BitConverter.ToString(encodedBytes));
// Decode the byte array back to a string.
String decodedString = Encoding.Default.GetString(encodedBytes);

Just used the Nathan's solution and it works fine. I needed to convert ISO-8859-1 to Unicode:
string isocontent = Encoding.GetEncoding("ISO-8859-1").GetString(fileContent, 0, fileContent.Length);
byte[] isobytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(isocontent);
byte[] ubytes = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.Unicode, isobytes);
return Encoding.Unicode.GetString(ubytes, 0, ubytes.Length);

Here is a sample for ISO-8859-9;
protected void btnKaydet_Click(object sender, EventArgs e)
{
Response.Clear();
Response.Buffer = true;
Response.ContentType = "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet";
Response.AddHeader("Content-Disposition", "attachment; filename=XXXX.doc");
Response.ContentEncoding = Encoding.GetEncoding("ISO-8859-9");
Response.Charset = "ISO-8859-9";
EnableViewState = false;
StringWriter writer = new StringWriter();
HtmlTextWriter html = new HtmlTextWriter(writer);
form1.RenderControl(html);
byte[] bytesInStream = Encoding.GetEncoding("iso-8859-9").GetBytes(writer.ToString());
MemoryStream memoryStream = new MemoryStream(bytesInStream);
string msgBody = "";
string Email = "mail#xxxxxx.org";
SmtpClient client = new SmtpClient("mail.xxxxx.org");
MailMessage message = new MailMessage(Email, "mail#someone.com", "ONLINE APP FORM WITH WORD DOC", msgBody);
Attachment att = new Attachment(memoryStream, "XXXX.doc", "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet");
message.Attachments.Add(att);
message.BodyEncoding = System.Text.Encoding.UTF8;
message.IsBodyHtml = true;
client.Send(message);}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Can't fully correct encoding issue from website [duplicate] - c#

Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(mystring));

Related

How can I transform string to UTF-8 in C#?

C# UTF8 Decoding, returning bytes/numbers instead of string

Is there anything wrong with this RC4 encryption code in C#

Unicode-to-string conversion in C#

C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

Categories

Resources