Im using this code:
$(document).ready(function () {
var breadCrumps = $('.breadcrumb');
breadCrumps.find('span').text("<%= ArticleSectionData.title %>");
});
title is a property which has values encoded in unicode (I think). These are Greek letters. On the local IIS developer server (embedded in visual studio), the characters are displayed in correct way but, on the test server they appear as:
Σ
Do You know any solution for this problem ?
Thanks for help
EDIT:
I have changed the code a little bit:
breadCrumps.find('span').text(<%= ArticleSectionData.title %>);
And now it works correctly, encoding is frustrating ...
If you are working off of a different database in test than in dev, then I suspect the issue is with the data. If you are storing HTML entities (eg, Σ) in your database, then you need to use .html(). If you are storing actual unicode characters (eg, Σ) in the database, then you need to use .text(). The way to represent Σ in html is with Σ. But if you set the text of an element to Σ, it displays that literally - the innerHTML of that element would contain Σ.
I don't know root of problem, but you can use this http://www.strictly-software.com/htmlencode for decode Σ to Sigma
Related
I tried to use UTF-8 and ran into trouble.
I have tried so many things; here are the results I have gotten:
???? instead of Asian characters. Even for European text, I got Se?or for Señor.
Strange gibberish (Mojibake?) such as Señor or 新浪新闻 for 新浪新闻.
Black diamonds, such as Se�or.
Finally, I got into a situation where the data was lost, or at least truncated: Se for Señor.
Even when I got text to look right, it did not sort correctly.
What am I doing wrong? How can I fix the code? Can I recover the data, if so, how?
This problem plagues the participants of this site, and many others.
You have listed the five main cases of CHARACTER SET troubles.
Best Practice
Going forward, it is best to use CHARACTER SET utf8mb4 and COLLATION utf8mb4_unicode_520_ci. (There is a newer version of the Unicode collation in the pipeline.)
utf8mb4 is a superset of utf8 in that it handles 4-byte utf8 codes, which are needed by Emoji and some of Chinese.
Outside of MySQL, "UTF-8" refers to all size encodings, hence effectively the same as MySQL's utf8mb4, not utf8.
I will try to use those spellings and capitalizations to distinguish inside versus outside MySQL in the following.
Overview of what you should do
Have your editor, etc. set to UTF-8.
HTML forms should start like <form accept-charset="UTF-8">.
Have your bytes encoded as UTF-8.
Establish UTF-8 as the encoding being used in the client.
Have the column/table declared CHARACTER SET utf8mb4 (Check with SHOW CREATE TABLE.)
<meta charset=UTF-8> at the beginning of HTML
Stored Routines acquire the current charset/collation. They may need rebuilding.
UTF-8 all the way through
More details for computer languages (and its following sections)
Test the data
Viewing the data with a tool or with SELECT cannot be trusted.
Too many such clients, especially browsers, try to compensate for incorrect encodings, and show you correct text even if the database is mangled.
So, pick a table and column that has some non-English text and do
SELECT col, HEX(col) FROM tbl WHERE ...
The HEX for correctly stored UTF-8 will be
For a blank space (in any language): 20
For English: 4x, 5x, 6x, or 7x
For most of Western Europe, accented letters should be Cxyy
Cyrillic, Hebrew, and Farsi/Arabic: Dxyy
Most of Asia: Exyyzz
Emoji and some of Chinese: F0yyzzww
More details
Specific causes and fixes of the problems seen
Truncated text (Se for Señor):
The bytes to be stored are not encoded as utf8mb4. Fix this.
Also, check that the connection during reading is UTF-8.
Black Diamonds with question marks (Se�or for Señor);
one of these cases exists:
Case 1 (original bytes were not UTF-8):
The bytes to be stored are not encoded as utf8. Fix this.
The connection (or SET NAMES) for the INSERT and the SELECT was not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Case 2 (original bytes were UTF-8):
The connection (or SET NAMES) for the SELECT was not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Black diamonds occur only when the browser is set to <meta charset=UTF-8>.
Question Marks (regular ones, not black diamonds) (Se?or for Señor):
The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
The column in the database is not CHARACTER SET utf8 (or utf8mb4). Fix this. (Use SHOW CREATE TABLE.)
Also, check that the connection during reading is UTF-8.
Mojibake (Señor for Señor):
(This discussion also applies to Double Encoding, which is not necessarily visible.)
The bytes to be stored need to be UTF-8-encoded. Fix this.
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
HTML should start with <meta charset=UTF-8>.
If the data looks correct, but won't sort correctly, then
either you have picked the wrong collation,
or there is no collation that suits your need,
or you have Double Encoding.
Double Encoding can be confirmed by doing the SELECT .. HEX .. described above.
é should come back C3A9, but instead shows C383C2A9
The Emoji 👽 should come back F09F91BD, but comes back C3B0C5B8E28098C2BD
That is, the hex is about twice as long as it should be.
This is caused by converting from latin1 (or whatever) to utf8, then treating those
bytes as if they were latin1 and repeating the conversion.
The sorting (and comparing) does not work correctly because it is, for example,
sorting as if the string were Señor.
Fixing the Data, where possible
For Truncation and Question Marks, the data is lost.
For Mojibake / Double Encoding, ...
For Black Diamonds, ...
The Fixes are listed here. (5 different fixes for 5 different situations; pick carefully): http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
I had similar issues with two of my projects, after a server migration. After searching and trying a lot of solutions, I came across with this one:
mysqli_set_charset($con,"utf8mb4");
After adding this line to my configuration file, everything works fine!
I found this solution for MySQLi—PHP mysqli set_charset() Function—when I was looking to solve an insert from an HTML query.
I was also searching for the same issue. It took me nearly one month to find the appropriate solution.
First of all, you will have to update you database will all the recent CHARACTER and COLLATION to utf8mb4 or at least which support UTF-8 data.
For Java:
while making a JDBC connection, add this to the connection URL useUnicode=yes&characterEncoding=UTF-8 as parameters and it will work.
For Python:
Before querying into the database, try enforcing this over the cursor
cursor.execute('SET NAMES utf8mb4')
cursor.execute("SET CHARACTER SET utf8mb4")
cursor.execute("SET character_set_connection=utf8mb4")
If it does not work, happy hunting for the right solution.
Set your code IDE language to UTF-8
Add <meta charset="utf-8"> to your webpage header where you collect data form.
Check your MySQL table definition looks like this:
CREATE TABLE your_table (
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8
If you are using PDO, make sure
$options = array(PDO::MYSQL_ATTR_INIT_COMMAND=>'SET NAMES utf8');
$dbL = new PDO($pdo, $user, $pass, $options);
If you already got a large database with above problem, you can try SIDU to export with correct charset, and import back with UTF-8.
Depending on how the server is setup, you have to change the encode accordingly. utf8 from what you said should work the best. However, if you're getting weird characters, it might help if you change the webpage encoding to ANSI.
This helped me when I was setting up a PHP MySQLi. This might help you understand more: ANSI to UTF-8 in Notepad++
I've been trying to figure this out for quite a while now and I seem to be getting nowhere.
I've been trying to display special chars on a webpage but I'm having lots of trouble achieving my goal.
I'll use the † symbol as an example to explain my issue.
I'm loading the string from a database so the symbol is stored as \u0086
Let's use the word †est as an example.. When I load the string on CodeBehind I get "\u0086est" and on the webpage I get est instead of the correct symbol.
I've been trying to encode the string in multiple ways but it seems that I'm out of luck as I can't seem to get it to work.
The closest I got was using this:
System.Text.Encoding.GetEncoding(1252).GetString(HttpUtility.UrlDecodeToBytes(myString))
which returned †est ... However I'm not certain if it would be a good idea to specify the codepage like this as I'm loading the strings from a database and other special characters may appear as well.
It would be great if I could get some help.
EDIT:
I'm simply setting a label's text to the string I'm loading from the database.
myLabel.InnerText = myString;
I'm having problems creating a query string and sending it to another webpage.
The text I'm trying to send is long and has special characters. Here is an example:
Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.
I can send this just fine if I hand code it:
<a href="Default.cshtml?standardText=Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.">
Link Text
</a>
This goes through without any problems, and I can read the entire Query String on the other side.
But if I am creating the link programmatically, my query string gets cut off right before the first character reference. I am using the following setup in a helper function:
string url = "Default.cshtml";
url += "?standardText=" + standard.text;
Link Text
When I use this, I only get "Understand a Fraction as 1/" and then it stops.
When I look at the page source, the only difference in the links is that one has actual ampersands and the second is having those turned into &
<a href="Default.cshtml?standardText=Understand a fraction 1/𝘣 as the quantity formed by 1 part when a whole is partitioned into 𝘣 equal parts; understand a fraction 𝘢/𝑏 as the quantity formed by 𝘢 parts of size 1/𝘣."
So the problem is not really the spaces, but the fact that the & is being interpreted as starting a new query string parameter.
I have tried various things [using HttpUtility.UrlEncode, HttpUtility.UrlEncodeUnicode, Html.Raw, trying to replace spaces with "+"], but the problem isn't with the spaces, its with how the character references are being handled. When I tried HttpUtility.urlEncode I got a double-encoding security error.
On the advice of OmG I tried replacing all the &s, #s, and /s using:
url = url.Replace("&","%26");
url = url.Replace("#","%23");
url = url.Replace("/","%2F");
This led to the following link:
All Items
And now when I click on the link I get a different security warning/error:
A potentially dangerous Request.QueryString value was detected from the client (standardText="...raction 1/𝘣 as the qua...").
I don't see why it is so hard to send character references through a QueryString. Is there a way to prevent Razor from converting all my &s to the & ; ? The address works fine when it is just plain "&"s.
Update: using URLDecode() on the string does not affect its character entity references, so when I try to decode the string then re-encode it, I still get the double-escape security warning.
Update: on the suggestion of #MikeMcCaughan, I tried using JS, but I am not very knowledgeable about mixing JS and Razor. I tried creating a link by dropping a script into the body like so:
<script type="text/javascript">
var a = document.createElement('a');
var linkText = document.createTextNode("my title text");
a.appendChild(linkText);
a.title = "my title text";
a.href = encodeURIComponent(#url);
document.body.appendChild(a);
</script>
But no link showed up, so I'm obviously doing it wrong.
For reference, when I try to use #Html.Raw(url),
Link Text
The &s are still turned into & ;s. the link renders as:
Link text
One simple solution is replacing the special characters by their encoding which can be accessed from here.
As you can find, replace in the string & with %26 using .replace for string. Also, replace / with %2F, # with %23, ; with %3B, and space with %20.
Also, You can do these in C# by the following function:
Server.URLEncode("<The Url>")
and in Javascript by the following function:
encodeURI("<The Url>")
Also, as you know the double-encoding is this. To prevent the double-encoding, you should have not encoded some part of the string before passing the string into the Server.URLEncode function.
I'm trying to make a c# project that reads from a MySQL database.
The data are inserted from a php page with utf-8 encoding. Both page and data is utf-8.
The data is self is greek words like "Λεπτομέρεια 3".
When fetching the data it looks like "ΛεπτομÎÏεια 3".
I have set 'charset=utf8' in the connection string and also tried with 'set session character_set_results=latin1;' query.
When doing the same with mysql (linux), MySQL Workbench, MySQL native connector for OpenOffice with OpenOffice Base, the data are displayed correctly.
I'm I doing something wrong or what else can I do?
Running the query 'SELECT value, HEX(value), LENGTH(value), CHAR_LENGTH(value) FROM call_attribute;' from inside my program.
It returns :
Value:
ΛεπτομÎÏεια 3
HEX(value) :
C38EE280BAC38EC2B5C38FE282ACC38FE2809EC38EC2BFC38EC2BCC38EC2ADC38FC281C38EC2B5C38EC2B9C38EC2B12033
LENGTH(value) :
49
CHAR_LENGTH(value) :
24
Any ideas???
You state that the first character of your data is capital lambda, Λ.
The UTF-8 represenation of this character is 0xCE 0x9B, whereas the HEX() value starts with C38E, which is indeed capital I with circumflex, as displayed in your question.
So I guess the original bug was not in the PHP configuration, and your impression that "data are displayed correctly" was wrong and due to an encoding problem.
Also note that the Greek alphabet only requires Latin-7, rather than Latin-1, when storing Greek data as single-byte characters rather than in Unicode.
Most likely, you have an encoding problem here, meaning different applications interpret the binary data as different character sets or encodings. (But lacking PHP and MySQL knowledge, I cannot really help you how to configure correctly).
You should try SET NAMES 'utf8' and have a look at this link
I've manage to solve my problem by setting the 'skip-character-set-client-handshake' in /etc/my.cnf'. After that everything was ok, the encoding of greek words was correct and the display was perfect.
One drawback was that I had to re-enter all the data into the database again.
I'm revamping an old .net 2 website, to get the look and feel of our new CI. Since there was money left over, I was told to review the code behind as well.
As of now, I ran into a serious problem with the charset: On almost all pages the German "special" characters like ß ä ö ü are rendered correct. But on one page every special character is rendered like a normal one. In this case ö --> o; ä--> a; ß --> ?
The text the query is grabbing from the database is rendered correctly in the debugger, but gets messed up as soon as its rendered in the browser.
I've set the charset in the master page to ISO-8859-1 as well as in config.web.
Help is much appreciated - thanks in advance.
Marco
Have you set the metatag in the head section of the rendered HTML?
I.e.
< meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
Ah, sorry! I've misread the line about the masterpage.
The Problem wasn't code related at all.
The admin who set up the machine, didnt use the normal Oracle client we normaly use. Instead he just copied over the instant client, set up the TNS_NAMES.ORA and was done with it.
This is wy, there never were Oracle entries in the registry, telling the client which charset to use.
Instead of bothering with it, I just pushed it through as an environmental var
NLS_LANG = German.Germany.Charset.
Problem solved.