I'm trying to add values in a paradox table with c#.
The point is that this table is containing localized strings, for which the Langdriver ANSII850 is required by the BDE.
I tried to use both OLEDB and Odbc drivers in .Net, but I cannot write correct values in my database. I always get encoding issues.
Example:
// ODBC Connection string (using string.Format for setting the path)
string connectionBase = #"Driver={{Microsoft Paradox Driver (*.db )}};DriverID=538;Fil=Paradox 5.X;DefaultDir={0};CollatingSequence=ASCII;";
// I tried to put the langdriver in the CollatingSequence parameter
string connectionBase = #"Driver={{Microsoft Paradox Driver (*.db )}};DriverID=538;Fil=Paradox 5.X;DefaultDir={0};CollatingSequence=ANSII850;";
// I tried the OleDb driver
string connectionBase = #"Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=Paradox 5.x;"Data Source={0};";
Then, I'm trying to insert the value "çã á çõ" in order to test. Depending on the driver I'm using, I get different results but the final string is never encoded correctly.
Edited:
Finally, I found a solution, but not ideal:
I'm able to switch from a langdriver to another by calling an external executable, written in delphi. In this case, I'm using ANSII850.
Then, I'm able to read data from my paradox tables. But I still don't get my data in a good format.
Strings from the tables are not encoded with the code page 850 either, trying to decode them with .Net tools just does not work
Instead, I'm manually tracking special chars (that are not correctly read) and replacing them by the correct utf8 chars.
For writing I'm doing the exact opposite.
It works, but it's still not ideal.
Are you sure you're using the BDE? Your examples refer to a lot of Microsoft parts.
The BDE used the higher codes for these "special characters" and a codepage to interpret them. Looks like 850 is what you think would be correct. If you can just send a string to the bde with the hex or decimal of the characters you want, you may be able to see if it will print that correctly.
Related
I have a PHP application that currently stores data in MySQL tables in non-conventional format(I assume this is because it is using a non-unicode mysql connection).
Example, this is one of the customer names as shown in PHP app UI:
DILORIO’S AUTO BODY SHOP
Notice there is a difference in apostrophe between it and the following.
DILORIO'S AUTO BODY SHOP
The latter one uses a standard latin apostrophe as oppose to unicode(i guess) style.
This name is stored in DB table like so:
DILORIO’S AUTO BODY SHOP
When it is being pulled from DB and displayed in UI it all looks correct, but the problem arised when I started to use MYSQL.Data C# connector to pull the same data.
At first I thought I should be able to just bull the value byte array and then convert it to latin1 (I assumed this is a default for PHP), however none of the existing encodings seemed to get me the result I wanted and this is what I get:
this is a DB collation for the field in mysql and how it looks:
Ideally I want to get rid of all corrupt data in DB and fix the PHP connection to unicode. But at this point it would be nice to just read whats already in there the same way as PHP is able to.
I also tried it with Encoding convert in all different combinations but no luck here either:
The text is encoded with Windows-1252, not Latin1, which is why your attempts to decode it above failed. Once you convert the string to Windows-1252 bytes, then decode that using UTF-8, you should have the correct value:
// note: on .NET 6.0, add 'System.Text.Encoding.CodePages' and call this line of code:
// Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var windows1252 = Encoding.GetEncoding(1252);
var utf8Bytes = windows1252.GetBytes("DILORIO’S AUTO BODY SHOP");
var correct = Encoding.UTF8.GetString(utf8Bytes);
// correct == "DILORIO’S AUTO BODY SHOP"
I am trying to read a dbf file through ADO using the FoxPro OLEDB driver. I can query fine however there are some special characters which do not seem to be coming through. They are not printable characters as disappear when clicked on however are definitely not the same via OLEDB as they are in FoxPro.
For example, the following field through Visual FoxPro:
When this is accessed through OLEDB it displays as the following:
I've narrowed this down to the fact that the first string contains the ASCII code 0 (null) character as the 10th character - this is valid however so I do not wish to remove it, but whatever I try the string ends after 9 characters when reading with ADO.
You don't show us any code and the image links are broken, we are left out with guesses. I have been using VFPOLEDB driver from C# for years and do no have this problem. I believe you are trying to describe a problem that exists on C# side and not VFP side. In VFP even the char(0) is a valid character. In C# however (docs are misleading IMO, says this is not the case but it is) strings are ASCIIZ strings where char(0) is accepted as the end of string. This should be your problem. You could simply read as a byte array instead, casting the field to a blob. Something like:
Instead of plain SQL like this:
select myField from myTable
Do like this and cast:
select cast(myField as w) as myField from myTable
EDIT: Images were not broken but blocked for me by my ISP, go figure why.
I'm trying to make a c# project that reads from a MySQL database.
The data are inserted from a php page with utf-8 encoding. Both page and data is utf-8.
The data is self is greek words like "Λεπτομέρεια 3".
When fetching the data it looks like "ΛεπτομÎÏεια 3".
I have set 'charset=utf8' in the connection string and also tried with 'set session character_set_results=latin1;' query.
When doing the same with mysql (linux), MySQL Workbench, MySQL native connector for OpenOffice with OpenOffice Base, the data are displayed correctly.
I'm I doing something wrong or what else can I do?
Running the query 'SELECT value, HEX(value), LENGTH(value), CHAR_LENGTH(value) FROM call_attribute;' from inside my program.
It returns :
Value:
ΛεπτομÎÏεια 3
HEX(value) :
C38EE280BAC38EC2B5C38FE282ACC38FE2809EC38EC2BFC38EC2BCC38EC2ADC38FC281C38EC2B5C38EC2B9C38EC2B12033
LENGTH(value) :
49
CHAR_LENGTH(value) :
24
Any ideas???
You state that the first character of your data is capital lambda, Λ.
The UTF-8 represenation of this character is 0xCE 0x9B, whereas the HEX() value starts with C38E, which is indeed capital I with circumflex, as displayed in your question.
So I guess the original bug was not in the PHP configuration, and your impression that "data are displayed correctly" was wrong and due to an encoding problem.
Also note that the Greek alphabet only requires Latin-7, rather than Latin-1, when storing Greek data as single-byte characters rather than in Unicode.
Most likely, you have an encoding problem here, meaning different applications interpret the binary data as different character sets or encodings. (But lacking PHP and MySQL knowledge, I cannot really help you how to configure correctly).
You should try SET NAMES 'utf8' and have a look at this link
I've manage to solve my problem by setting the 'skip-character-set-client-handshake' in /etc/my.cnf'. After that everything was ok, the encoding of greek words was correct and the display was perfect.
One drawback was that I had to re-enter all the data into the database again.
Im writing an Outlook Add-in to file emails acdcording to certain parameters.
I am currently storing the Outlook.MailItem.Body property in a varbinary(max) field in SQL Server 2008R2. I have also enabled FTS on this column.
Currently I store the Body property of the email as a byte array in the database, and use ASCIIEncoder.GetBytes() function to convert this clear text. Currently I am experiencing some weird results, whereby I notice ? characters occasionally for apostrophes and new lines.
I have two questions:
Is this the best method to store text in a database? As a byte array? And is the ASCIIEncoder the best method to acheive this?
I want to handle Unicode strings correctly, is there anything I should be aware of?
I'm not sure whether FullTextSearch works best on VarBinary columns, though my instinct says "no", but I can answer the second half of your question.
The reason you're getting odd characters is that ASCIIEncoder.GetBytes() treats the text as ASCII, and can have exactly those sort of errors if the text you're encoding ISN'T ASCII-encoded. By default, strings in .NET are UTF8, so you're probably running into problems there. Use Encoding.UTF8.GetBytes() to get the bytes for a UTF8 string.
This also answers the second question - is this method useful for Unicode strings? Yes, since you're not storing strings at all. You're storing bytes, which your application happens to know are encoded Unicode strings. SQL won't do anything to them, because they're just bytes.
Since you have to support Unicode characters and handle only text you should store your data in a column of type nvarchar. That would address both of your problems:
1.) Text is saved as variable-length Unicode character data in the database, you don't need a byte encoder/decoder to retrieve the data
2.) See 1.)
I have a huge MySQL table which has its rows encoded in UTF-8 twice.
For example "Újratárgyalja" is stored as "Újratárgyalja".
The MySQL .Net connector downloads them this way. I tried lots of combinations with System.Text.Encoding.Convert() but none of them worked.
Sending set names 'utf8' (or other charset) won't solve it.
How can I decode them from double UTF-8 to UTF-8?
Peculiar problem, but I think I can reproduce it by a suitably-unholy mix of UTF-8 and Latin-1 (not by just two uses of UTF-8 without an interspersed mis-step in Latin-1 though). Here's the whole weird round trip, "there and back again" (Python 2.* or IronPython should both be able to reproduce this):
# -*- coding: utf-8 -*-
uni = u'Újratárgyalja'
enc1 = uni.encode('utf-8')
enc2 = enc1.decode('latin-1').encode('utf-8')
dec3 = enc2.decode('utf-8')
dec4 = dec3.encode('latin-1').decode('utf-8')
for x in (uni, enc1, enc2, dec3, dec4):
print repr(x), x
This is the interesting output...:
u'\xdajrat\xe1rgyalja' Újratárgyalja
'\xc3\x9ajrat\xc3\xa1rgyalja' Újratárgyalja
'\xc3\x83\xc2\x9ajrat\xc3\x83\xc2\xa1rgyalja' Ãjratárgyalja
u'\xc3\x9ajrat\xc3\xa1rgyalja' Ãjratárgyalja
u'\xdajrat\xe1rgyalja' Újratárgyalja
The weird string starting with à appears as enc2, i.e. two utf-8 encodings WITH an interspersed latin-1 decoding thrown into the mix. And as you can see it can be undone by the exactly-converse sequence of operations: decode as utf-8, re-encode as latin-1, re-decode as utf-8 again -- and the original string is back (yay!).
I believe that the normal round-trip properties of both Latin-1 (aka ISO-8859-1) and UTF-8 should guarantee that this sequence will work (sorry, no C# around to try in that language right now, but I would expect that the encoding/decoding sequences should not depend on the specific programming language in use).
When you write "The MySQL .Net connector downloads them this way." there's a good chance this means the MySQL .Net connector believes it is speaking Latin-1 to MySQL, while MySQL believes the conversation is in UTF-8. There's also a chance the column is declared as Latin-1, but actually contains UTF-8 data.
If it's the latter (column labelled Latin-1 but data is actually UTF-8) you will get mysterious collation problems and other bugs if you make use of MySQL's text processing functions, ORDER BY on the column, or other situations where the text "means something" rather than just being bytes sent over the wire.
In either case you should try to fix the underlying problem, not least because it is going to be a complete headache for whoever has to maintain the system otherwise.
You could try using
SELECT CONVERT(`your_column` USING ascii)
FROM `your_table`
at the MySQL query level. This is a stab in the dark, though.