I have a mysql database with utf8_general_ci encoding ,
i'm connecting to the same database with php using utf-8 page and file encode and no problem
but when connection mysql with C# i have letters like this غزة
i editit the connection string to be like this
server=localhost;password=root;User Id=root;Persist Security Info=True;database=mydatabase;Character Set=utf8
but the same problem .
Server=myServerAddress;Database=myDataBase;Uid=myUsername;Pwd=myPassword; CharSet=utf8;
Note! Use lower case value utf8 and not upper case UTF8 as this will fail.
See http://www.connectionstrings.com/mysql
could you try:
Server=localhost;Port=3306;Database=xxx;Uid=x xx;Pwd=xxxx;charset=utf8;"
Edit: I got a new idea:
//To encode a string to UTF8 encoding
string source = "hello world";
byte [] UTF8encodes = UTF8Encoding.UTF8.GetBytes(source);
//get the string from UTF8 encoding
string plainText = UTF8Encoding.UTF8.GetString(UTF8encodes);
good luck
more info about this technique http://social.msdn.microsoft.com/forums/en-us/csharpgeneral/thread/BF68DDD8-3D95-4478-B84A-6570A2E20AE5
You might need to use the "utf8mb4" character set for the column in order to support 4 byte characters like this: "λ𝛌 "
The utf8 charset only supports 1-3 bytes per character and thus can't support all unicode characters.
See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html for more details.
CHARSET should be uppercase
Server=localhost;Port=3306;Database=xxx;Uid=x xx;Pwd=xxxx;CHARSET=utf8;
Just in case some come here later.
I needed to create a Seed method using Mysql with EF6, to load a SQL file. After running it I got weird characters on database like ? replacing é, ó, á
SOLUTION:
Make sure I read the file using the right charset: UTF8 on my case.
var path = System.AppDomain.CurrentDomain.BaseDirectory;
var sql = System.IO.File.ReadAllText(path + "../../Migrations/SeedData/scripts/sectores.sql", Encoding.UTF8);
And then M.Shakeri reminder:
CHARSET=utf8 on cxn string in web.config. Using CHARSET as uppercase and utf8 lowercase.
Hope it helps.
R.
One thing I found, but haven't had the opportunity to really browse is the collation charts available here: http://www.collation-charts.org/mysql60/
This will show you which characters are part of a given MySQL collation so you can pick the best option for your dataset.
Setting the charset in the connection string refers to the charset of the queries sent to the server. It does not affect the results returned from the server.
https://dev.mysql.com/doc/connectors/en/connector-net-connection-options.html
One way I have found to specify the charset from the client is to run this after opening the connection.
set character_set_results='utf8';
this worked for me:
"datasource=xxx;port=3306;username=xxx;password=xxx;database=xxx;charset=utf8mb4"
Related
I'm trying to make a c# project that reads from a MySQL database.
The data are inserted from a php page with utf-8 encoding. Both page and data is utf-8.
The data is self is greek words like "Λεπτομέρεια 3".
When fetching the data it looks like "ΛεπτομÎÏεια 3".
I have set 'charset=utf8' in the connection string and also tried with 'set session character_set_results=latin1;' query.
When doing the same with mysql (linux), MySQL Workbench, MySQL native connector for OpenOffice with OpenOffice Base, the data are displayed correctly.
I'm I doing something wrong or what else can I do?
Running the query 'SELECT value, HEX(value), LENGTH(value), CHAR_LENGTH(value) FROM call_attribute;' from inside my program.
It returns :
Value:
ΛεπτομÎÏεια 3
HEX(value) :
C38EE280BAC38EC2B5C38FE282ACC38FE2809EC38EC2BFC38EC2BCC38EC2ADC38FC281C38EC2B5C38EC2B9C38EC2B12033
LENGTH(value) :
49
CHAR_LENGTH(value) :
24
Any ideas???
You state that the first character of your data is capital lambda, Λ.
The UTF-8 represenation of this character is 0xCE 0x9B, whereas the HEX() value starts with C38E, which is indeed capital I with circumflex, as displayed in your question.
So I guess the original bug was not in the PHP configuration, and your impression that "data are displayed correctly" was wrong and due to an encoding problem.
Also note that the Greek alphabet only requires Latin-7, rather than Latin-1, when storing Greek data as single-byte characters rather than in Unicode.
Most likely, you have an encoding problem here, meaning different applications interpret the binary data as different character sets or encodings. (But lacking PHP and MySQL knowledge, I cannot really help you how to configure correctly).
You should try SET NAMES 'utf8' and have a look at this link
I've manage to solve my problem by setting the 'skip-character-set-client-handshake' in /etc/my.cnf'. After that everything was ok, the encoding of greek words was correct and the display was perfect.
One drawback was that I had to re-enter all the data into the database again.
I have an asp.net c# page and am trying to read a file that has the following charater ’ and convert it to '. (From slanted apostrophe to apostrophe).
FileInfo fileinfo = new FileInfo(FileLocation);
string content = File.ReadAllText(fileinfo.FullName);
//strip out bad characters
content = content.Replace("’", "'");
This doesn't work and it changes the slanted apostrophes into ? marks.
I suspect that the problem is not with the replacement, but rather with the reading of the file itself. When I tried this the nieve way (using Word and copy-paste) I ended up with the same results as you, however examining content showed that the .Net framework believe that the character was Unicode character 65533, i.e. the "WTF?" character before the string replacement. You can check this yourself by examining the relevant character in the Visual Studio debugger, where it should show the character code:
content[0]; // 65533 '�'
The reason why the replace isn't working is simple - content doesn't contain the string you gave it:
content.IndexOf("’"); // -1
As for why the file reading isn't working properly - you are probably using the wrong encoding when reading the file. (If no encoding is specified then the .Net framework will try to determine the correct encoding for you, however there is no 100% reliable way to do this and so often it can get it wrong). The exact encoding you need depends on the file itself, however in my case the encoding being used was Extended ASCII, and so to read the file I just needed to specify the correct encoding:
string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding("iso-8859-1"));
(See this question).
You also need to make sure that you specify the correct character in your replacement string - when using "odd" characters in code you may find it more reliable to specify the character by its character code, rather than as a string literal (which may cause problems if the encoding of the source file changes), for example the following worked for me:
content = content.Replace("\u0092", "'");
My bet is the file is encoded in Windows-1252. This is almost the same as ISO 8859-1. The difference is Windows-1252 uses "displayable characters rather than control characters in the 0x80 to 0x9F range". (Which is where the slanted apostrophe is located. i.e. 0x92)
//Specify Windows-1252 here
string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding(1252));
//Your replace code will then work as is
content = content.Replace("’", "'");
// This should replace smart single quotes with a straight single quote
Regex.Replace(content, #"(\u2018|\u2019)", "'");
//However the better approach seems to be to read the page with the proper encoding and leave the quotes alone
var sreader= new StreamReader(fileInfo.Create(), Encoding.GetEncoding(1252));
If you use String (capitalized) and not string, it should be able to handle any Unicode you throw at it. Try that first and see if that works.
I have an asp.net page connected to a MySql DB.
When I try to insert/update values from the webpage into the DB
the chars are shown in the DB as question marks (I am using SP).
If i will write a query directly in the DB, It will work and the chars
will be displayed correctly.
The DB default charset is utf8, and the column collation is utf8_general_ci.
10x alot & Have a great weekend :)
Eventually what solved my problem is adding CharSet=utf8 to the connection string.
10x alot everyone :)
I believe your C# strings are being treated as unicode instead of UTF8
Some sample code from a snippet I had found some time ago:
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);
MessageBox.Show(s_unicode2);
I came upon trying to convert a database that is encoded in UTF8 from what it looks like, into a windows 1251 encoding (dont ask, but I need to do this). All of the Russian, encoded characters in the db show up as абвгдÐ. When I pull them out of the db into my C# app, into strings, I still see абвгдÐ. No matter what I try to do to interpret this string as UTF8 encoded string, it seems to be interpreted as latin1 single byte string, and I do not see my text show up as russian. What I basically need to do is convert this latin1 looking-utf8 encoded string into Unicode, so that I can convert it later to 1251, but I have not been able to do this successfully. Anyone got any ideas?
Encoding.UTF8.GetString(Encoding.GetEncoding("iso-8859-1").GetBytes(s))
Now you have a normal Unicode string containing Cyrillic.
Note that it is possible that your ‘Latin-1’ misencoded string might actually be a ‘Windows codepage 1252’ misencoded string; I can't tell from the given example as it doesn't use any of the characters that are different between the two encodings. If this is the case use GetEncoding(1252) instead.
Also this is assuming that it's the contents of the database at fault. If the database is supposed to be storing UTF-8 strings but you're pulling them out as if they were Latin-1 (or codepage 1252 due to that being the system codepage) then really you need to reconfigure your data access layer to set the right encoding. If you're using SQL Server, better to start using NVARCHAR.
I am using sql server, and all columns are nvarchar. The data was imported with mysql dump from a db that was latin1, not utf8. So all the unicode strings are simply latin1 encoded. In any case, I figured it out, and its very similar to what you suggested. here's what I did to convert the latin1 encoded utf8 into 1251.
//re interpret latin1 in proper utf8 encoding
str = Encoding.UTF8.GetString(Encoding.GetEncoding("iso-8859-1").GetBytes(str));
//convert from utf8 to 1251
str = Encoding.GetEncoding(1251).GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(1251), Encoding.UTF8.GetBytes(str)));
I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.
The string that I retrieve from it have strange characters when ë like characters are expected.
For example: "ë" is "ë".
I thought I could get this right by converting from UTF8 to UTF16.
return Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
}
This however doesn't change a thing.
How could I get the data from this database in proper form?
There are two things that you need to do to support UTF-8 in the ADO.NET Entity frame work (or in general using the MySQL .NET Connector):
Ensure that the collation of your database of table is a UTF-8 collation (i.e. utf8_general_ci or one of its relations)
Add Charset=utf8; to your connection string.
"Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"
I'm not certain, but the encoding may be case sensitive; I found that CharSet=UTF8; did not work for me.
Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:
Ensure you are using a Unicode field type like NVARCHAR or TEXT CHARSET utf8
Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
Whenever you select based on Unicode data ensure you use the N prefix again
MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')");
MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");
If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set
Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.
As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.
What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "ë" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything.
If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of à and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.
thank you The Mouth of a Cow ,
your solution works but still we need converting characters.
i think this is your problem :)
and for converting characters you can use this code
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
string s = "unicode";
//string to utf
byte[] utf = System.Text.Encoding.UTF8.GetBytes(s);
//utf to string
string s2= System.Text.Encoding.UTF8.GetString(utf);
"Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"
It worked - PowerShell 7.2, MySQL Connector 8.0.29