read/write unicode data in MySql

read/write unicode data in MySql - c#

I am using MySql DB and want to be able to read & write unicode data values. For example, French/Greek/Hebrew values.
My client program is C# (.NET framework 3.5).
How do i configure my DB to allow unicode? and how do I use C# to read/write values as unicode from MySql?
Upddate: 7 Sep. 09
OK, So my Schema, Table & columns are set to 'utf8' + collation 'utf8_general_ci'. I run the 'set names utf8' when the connection is opened. so far so good... but, still values are saved as '??????? '
any ideas?
The Solution!
OK, so for C# client to read & write unicode values, you must include in the connection string: charset=utf8
for example: server=my_sql_server;user id=my_user;password=my_password;database=some_db123;charset=utf8;
of course you should also define the relevant table as utf8 + collation utf8_bin.

The Solution!
OK, so for C# client to read & write unicode values, you must include in the connection string: charset=utf8
for example: server=my_sql_server;user id=my_user;password=my_password;database=some_db123;charset=utf8;
of course you should also define the relevant table as utf8 + collation utf8_bin.

You have to set the collation for your MySQL schema, tables or even columns.
Most of the time, the utf8_general_ci collation is used because it is case insensitive and accent insensitive comparisons.
On the other hand, utf8_unicode_ci is case sensitive and uses more advanced sorting technics (like sorting eszet ('ß') near 'ss'). This collation is a tiny bit slower than the other two.
Finally, utf8_bin compares string using their binary value. Thus, it also is case sensitive.
If you're using MySQL's Connector/NET (which I recommend), everything should go smoothly.

try to use this query before any other fetch or send:
SET NAMES UTF8

You need to set the db charset to UTF-8 (if you are using utf-8), collation for relevant tables/fields to utf, execute SET NAMES 'UTF-8' before doing queries, and of course make sure you set the proper encoding in the html that is showing the output.

Related

OdbcConnection returning Chinese Characters as "?"

I have an Oracle database that stores some data values in Simplified Chinese. I have created an ASP.net MVC C# webpage that is supposed to display this information. I am using a OdbcConnection in order to retrieve the data, however when I run my da.Fill(t) command the values return as "?"
OdbcCommand cmd = new OdbcCommand();
cmd.CommandText = select;
OdbcConnection SqlConn = new OdbcConnection("Driver={Oracle in instantclient_11_2};Dbq=Database;Uid=Username;pwd=password;");
DataTable t = new DataTable();
cmd.Connection = SqlConn;
SqlConn.Open();
OdbcDataAdapter da = new OdbcDataAdapter(cmd);
SqlConn.Close();
da.Fill(t);
return t;
t has the data but everything that is supposed to be the Chinese characters is just a series of "?????"

Problems with character set are quite common, let me try to give some general notes.
In principle you have to consider four different character set settings.
1 and 2: NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET
Example: AL32UTF8
They are defined only on your database, you can interrogate them with
SELECT *
FROM V$NLS_PARAMETERS
WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
These settings define which characters (in which format) can be stored in your database - no more, no less. It requires some effort (see Character Set Migration and/or Oracle Database Migration Assistant for Unicode) if you have to change it on existing database.
You can find Oracle supported character set at Character Sets.
3: NLS_LANG
Example: AMERICAN_AMERICA.AL32UTF8
This value is defined only on your client. NLS_LANG has nothing to do with the ability to store characters in a database. It is used to let Oracle know what character set you are using on the client side. When you set NLS_LANG value (for example to AL32UTF8) then you just tell the Oracle database "my client uses character set AL32UTF8" - it does not necessarily mean that your client is really using AL32UTF8! (see below #4)
NLS_LANG can be defined by environment variable NLS_LANG or by Windows Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 32 bit), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 64 bit). Depending on your application there might be other ways to specify NLS_LANG, but let's stick to the basics. If NLS_LANG value is not provided then Oracle defaults it to AMERICAN_AMERICA.US7ASCII
Format of NLS_LANG is NLS_LANG=language_territory.charset. The {charset} part of NLS_LANG is not shown in any system table or view. All components of the NLS_LANG definition are optional, so following definitions are all valid: NLS_LANG=.WE8ISO8859P1, NLS_LANG=_GERMANY, NLS_LANG=AMERICAN, NLS_LANG=ITALIAN_.WE8MSWIN1252, NLS_LANG=_BELGIUM.US7ASCII.
As stated above the {charset} part of NLS_LANG is not available in database at any system table/view or any function. Strictly speaking this is true, however you can run this query:
SELECT DISTINCT CLIENT_CHARSET
FROM V$SESSION_CONNECT_INFO
WHERE (SID, SERIAL#) = (SELECT SID, SERIAL# FROM v$SESSION WHERE AUDSID = USERENV('SESSIONID'));
It should return character set from your current NLS_LANG setting - however based on my experience the value is often NULL or Unknown, i.e. not reliable.
Find more very useful information here: NLS_LANG FAQ
Note, some technologies do not utilize NLS_LANG, settings there do not have any effect, for example:
ODP.NET Managed Driver is not NLS_LANG sensitive. It is only .NET locale sensitive. (see Data Provider for .NET Developer's Guide)
OraOLEDB (from Oracle) always use UTF-16 (see OraOLEDB Provider Specific Features)
Java based JDBC (for example SQL Developer) has its own methods to deal with character sets (see Database JDBC Developer's Guide - Globalization Support for further details)
4: The "real" character set of your terminal, your application or the encoding of .sql files
Example: UTF-8
If you work with a terminal program (i.e. SQL*plus or isql) you can interrogate the code page with command chcp, on Unix/Linux the equivalent is locale charmap or echo $LANG. You can get a list of all Windows code pages identifiers from here: Code Page Identifiers. Note, for UTF-8 (chcp 65001) there are some issues, see this discussion.
If you work with .sql files and an editor like TOAD or SQL-Developer you have to check the save options. Usually you can choose values like UTF-8, ANSI, ISO-8859-1, etc.
ANSI means the Windows ANSI codepage, typically CP1252, you can check in your Registry at HKLM\SYSTEM\ControlSet001\Control\Nls\CodePage\ACP or here: National Language Support (NLS) API Reference
[Microsoft removed this reference, take it form web-archive [National Language Support (NLS) API Reference]
11]
How to set all these values?
The most important point is to match NLS_LANG and your "real" character set of your terminal, resp. application or the encoding of your .sql files
Some common pairs are:
CP850 -> WE8PC850
CP1252 or ANSI (in case of "Western" PC) -> WE8MSWIN1252
ISO-8859-1 -> WE8ISO8859P1
ISO-8859-15 -> WE8ISO8859P15
UTF-8 -> AL32UTF8
Or run this query to get some more:
SELECT VALUE AS ORACLE_CHARSET, UTL_I18N.MAP_CHARSET(VALUE) AS IANA_NAME
FROM V$NLS_VALID_VALUES
WHERE PARAMETER = 'CHARACTERSET';
Some technologies make you life easier, e.g. ODP.NET (unmanged driver) or ODBC driver from Oracle automatically inherits the character set from NLS_LANG value, so condition from above is always true.
Is it required to set client NLS_LANG value equal to database NLS_CHARACTERSET value?
No, not necessarily! For example, if you have the database character set NLS_CHARACTERSET=AL32UTF8 and the client character set NLS_LANG=.ZHS32GB18030 then it will work without any problem (provided your client really uses GB18030), although these character sets are completely different. GB18030 is a character set commonly used for Chinese, like UTF-8 it supports all Unicode characters.
If you have, for example NLS_CHARACTERSET=AL32UTF8 and NLS_LANG=.WE8ISO8859P1 it will also work (again, provided your client really uses ISO-8859-P1). However, the database may store characters which your client is not able to display, instead the client will display a placeholder (e.g. ¿).
Anyway, it is beneficial to have matching NLS_LANG and NLS_CHARACTERSET values, if suitable. If they are equal you can be sure that any character which may be stored in database can also be displayed and any character you enter in your terminal or write in your .sql file can also be stored in database and is not substituted by placeholder.
Supplement
So many times you can read advise like "The NLS_LANG character set must be the same as your database character set" (also here on SO). This is simply not true and a popular myth!
See also Should the NLS_LANG Setting Match the Database Character Set?
The NLS_LANG character set should reflect the setting of the operating system character set of the client. For example, if the database character set is AL32UTF8 and the client is running on a Windows operating system, then you should not set AL32UTF8 as the client character set in the NLS_LANG parameter because there are no UTF-8 WIN32 clients. Instead, the NLS_LANG setting should reflect the code page of the client. For example, on an English Windows client, the code page is 1252. An appropriate setting for NLS_LANG is AMERICAN_AMERICA.WE8MSWIN1252.
Setting NLS_LANG correctly enables proper conversion from the client operating system character set to the database character set. When these settings are the same, Oracle Database assumes that the data being sent or received is encoded in the same character set as the database character set, so character set validation or conversion may not be performed. This can lead to corrupt data if the client code page and the database character set are different and conversions are necessary.
However, statement "there are no UTF-8 WIN32 clients" is certainly outdated nowadays!
Here is the proof:
C:\>set NLS_LANG=.AL32UTF8
C:\>sqlplus ...
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 CharSet VARCHAR2(20);
3 BEGIN
4 SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
5 DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
6 IF UNISTR('\20AC') = '€' THEN
7 DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
8 ELSE
9 DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
10 END IF;
11 END;
12 /
Database NLS_CHARACTERSET is AL32UTF8
"€" is not the same as U+20AC
PL/SQL procedure successfully completed.
Both, client and database character sets are AL32UTF8, however the characters do not match. The reason is, my cmd.exe and thus also SQL*Plus use Windows CP1252. Therefore I must set NLS_LANG accordingly:
C:\>chcp
Active code page: 1252
C:\>set NLS_LANG=.WE8MSWIN1252
C:\>sqlplus ...
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 CharSet VARCHAR2(20);
3 BEGIN
4 SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
5 DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
6 IF UNISTR('\20AC') = '€' THEN
7 DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
8 ELSE
9 DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
10 END IF;
11 END;
12 /
Database NLS_CHARACTERSET is AL32UTF8
"€" is equal to U+20AC
PL/SQL procedure successfully completed.
Also consider this example:
CREATE TABLE ARABIC_LANGUAGE (
LANG_CHAR VARCHAR2(20),
LANG_NCHAR NVARCHAR2(20));
INSERT INTO ARABIC_LANGUAGE VALUES ('العربية', 'العربية');
You would need to set two different values for NLS_LANG for a single statement - which is not possible.
See also If we have US7ASCII characterset why does it let us store non-ascii characters? or difference between NLS_NCHAR_CHARACTERSET and NLS_CHARACTERSET for Oracle

Parametrized hexadecimal literal in DB2 C# IBM driver

There is DB2 database.
There is C#.NET application, which uses IBM.Data.DB2 driver to connect to the database (IBMDB2).
There is a parametrized query (DB2Command object, initialized with):
"SELECT $coid_ref FROM db.$ext WHERE $coid = #coid"
It's needed to substitute #coid with hexadecimal literal. For example, it should be executed like this:
"SELECT $coid_ref FROM db.$ext WHERE $coid = x'AA23909F'"
But, when I try to add parameter via command.Parameters.Add("#coid", "AA23909F") driver tries to add it as string, which leads to error. How can I solve this?

You are passing in a regular string.
You need to use a hexadecimal literal value.
From what I could find...
command.Parameters.Add("#coid", "\xAA\x23\x90\x9F")
What DB2 platform are you working with? If an EBCDIC platform (IBM i or z/OS) you might have a problem with your string being translated...
But this seems to be a really strange need. Is the column really defined as a 4 byte binary string? (is so it should look like CHAR(4) FOR BIT DATA assuming you're not using unicode.)

SQLServer nchar and .Net unicode with special F801 charcter

I have an existing database with existing data that I can't change it's structure or values.
In that database there is a nvarchar column that contains values in the twilight unicode zone starting with F800, upward.
When I select those values in SQL or use SQL function, unicode - I get the proper values.
When I select the same values in .Net - I get an error value - all the values in that twilight zone become 65533.
I need those values - how can I presuade .Net to give me those values - something like chaninging the connection encoding to a custom one - or ucs-2 etc...
Here is a sample code that demonstraits the problem:
c.CommandText = "select NCHAR(55297)";
using (var r = c.ExecuteReader())
{
r.Read();
var result = r[0]; //expected 55297 but got 65533
}

55297 is D801 which isn't defined? you probably want f801 which is 63489? But it appears as if that one isn't defined either. Which characters do you want?
If I try doing a "select NCHAR(55297)" in SQL Server Management studio, I get back the diamond question mark, but if I do "select NCHAR(63489)" I get back a dot of some sort: 
If what you want is the character values, you can ask for them directly:
select Unicode(NCHAR(63489))
This returns 63489 (as an integer)
If you want them as a byte array, you can ask for that:
select CONVERT(varbinary(MAX), FieldThatIsAnNvarchar) from ThatTable

After much investigations I failed to find any way around this. I couldn't find any two way conversion that would work here.
It seems that some unicode values are intended for some strange unicode scenario that isn't supported by .Net, but is partially supported in a way that breaks what we need here.

How can I trim string property datatypes from database char datatypes in Fluent NHibernate

I have a SQL database and a Oracle database with the same schema.
Therefore I want to use my model classes for the two databases and all I will do is change the database connection string in the Fluent NHibernate configuration.
I have some database char data types for columns but in my model classes I have set them as string however when they are returned from queries they have padded white space.
How do I return them trimmed without causing problems when I query the database using these colums as they will need to match the fixed length specification.

You can create an implementation of NHibernate.UserTypes.IUserType that will trim the whitespace when fetched from the database and re-pad it when going back to the database.
In your fluent mapping, you just add .CustomType<T> where T is your IUserType implementation.
This article is helpful for properly implementing IUserType. Don't get hung up on methods like Assemble, Disassemble, DeepCopy, Replace -- it doesn't look like you'll ever hit those, even. You're most concerned with NullSafeGet, in which you'll trim, and NullSafeSet in which you'll re-pad.
update
Upon further consideration, I'm not sure you'll actually need to re-pad the value when inserting to the database -- the db itself will enforce the fixed length of the column.
In response to the link you provided in your comment, I think that implementation pretty much gets you there, but I do think you might need to modify it a bit. For one thing, you may not want to trim both leading and trailing whitespace, as Trim() will do. For another, in your override of Equals, you'll want
value
to equal
value

The char data type in SQL databases is padded with whitespace.
If you can modify the database, you can create a view that trims the columns. But a trimmed value won't match the untrimmed value. (In a SQL query, that is.) TRIM() is a standard SQL function, but not all platforms support it. I suppose you might be able to cast it to VARCHAR(); I'm not sure I've ever tried that.
If I were in your shoes, I think I'd cache literally the value the database contains. (That is, I'd store it in a variable of some kind.) I might trim it for display to the user, but any interaction between my application code and the database would be done using the actual values in the database.

Is there an encryption method for a column with a data type of int?

The scenario is that I want to encrypt finance numbers in a column with a data type of int in a sql server table.
It is a big app so it is difficult to change the table column data type from int to any other data type.
I'm using sql server 2005 and asp.net C#.
Is there a two-way encryption method for a column with a data type of int?
Could I use a user-defined-function in sql server 2005 or a possibly a C# method?

I'm sorry but I simply can't see the rationale for encrypting numbers in a database. If you want to protect the data from prying eyes, surely SQL Server has security built into it, yes?
In that case, protect the database with its standard security. If not, get a better DBMS (though I'd be surprised if this were necessary).
If you have bits of information from that table that you want to make available (like some columns but not others), use a view, or a trigger to update another table (less secured), or a periodic transfer to that table.

XOR?
:)
Hmm, need more text...

There are a few two way encryption schemes available in .Net.
Simple insecure two-way "obfuscation" for C#
You can either convert the integer to it's byte array equivalent or convert it to a base-64 string and encrypt that.

Well, every injective, surjective function from int to int can be used as a way to "encode" an integer.
You could build such a function by creating a random array with 65536 items with no duplicate entries und using f(i) = a[i]. To "decode" your int you simply create another array with b[i] = x | a[x] = i.
As the others have mentioned, this may not be what you REALLY want to do. =)
Edit: Check out Jim Dennis' comment!

You might want to look at format preserving encryption.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

read/write unicode data in MySql - c#

try to use this query before any other fetch or send: SET NAMES UTF8

You need to set the db charset to UTF-8 (if you are using utf-8), collation for relevant tables/fields to utf, execute SET NAMES 'UTF-8' before doing queries, and of course make sure you set the proper encoding in the html that is showing the output.

Related

OdbcConnection returning Chinese Characters as "?"

Parametrized hexadecimal literal in DB2 C# IBM driver

SQLServer nchar and .Net unicode with special F801 charcter

How can I trim string property datatypes from database char datatypes in Fluent NHibernate

Is there an encryption method for a column with a data type of int?

Categories

Resources